Computer Science and Information Systems 2024 Volume 21, Issue 1, Pages: 335-362
https://doi.org/10.2298/CSIS221210071A
Full text ( 3320 KB)


Knowledge transfer in multi-objective multi-agent reinforcement learning via generalized policy improvement

de Almeida Vicente N. (Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, Brazil), [email protected]
Alegre Lucas N. (Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, Brazil), [email protected]
Bazzan Ana L.C. (Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, Brazil), [email protected]

Even though many real-world problems are inherently distributed and multi-objective, most of the reinforcement learning (RL) literature deals with single agents and single objectives. While some of these problems can be solved using a single-agent single-objective RL solution (e.g., by specifying preferences over objectives), there are robustness issues, as well the fact that preferences may change over time, or it might not even be possible to set such preferences. Therefore, a need arises for a way to train multiple agents for any given preference distribution over the objectives. This work thus proposes a multi-objective multi-agent reinforcement learning (MOMARL) method in which agents build a shared set of policies during training, in a decentralized way, and then combine these policies using a generalization of policy improvement and policy evaluation (fundamental operations of RL algorithms) to generate effective behaviors for any possible preference distribution, without requiring any additional training. This method is applied to two different application scenarios: a multi-agent extension of a domain commonly used in the related literature, and traffic signal control, which is more complex, inherently distributed and multi-objective (the flow of both vehicles and pedestrians are considered). Results show that the approach is able to effectively and efficiently generate behaviors for the agents, given any preference over the objectives.

Keywords: reinforcement learning, multi-agent systems, multi-objective decision making, generalized policy improvement, traffic signal control


Show references