Consensus of Multi-agent Reinforcement Learning Systems: The Effect of Immediate Rewards

Neshat Elhami Fard, Rastko Selmic

Abstract


This paper studies the consensus problem of a leaderless, homogeneous, multi-agent reinforcement learning (MARL) system using actor-critic algorithms with and without malicious agents. The goal of each agent is to reach the consensus position with the maximum cumulative reward. Although the reward function converges in both scenarios, in the absence of the malicious agent, the cumulative reward is higher than with the malicious agent present. We consider here various immediate reward functions. First, we study the immediate reward function based on Manhattan distance. In addition to proposing three different immediate reward functions based on Euclidean, $n$-norm, and Chebyshev distances, we have rigorously shown which method has a better performance based on a cumulative reward for each agent and the entire team of agents. Finally, we present a combination of various immediate reward functions that yields a higher cumulative reward for each agent and the team of agents. By increasing the agents’ cumulative reward using the combined immediate reward function, we have demonstrated that the cumulative team reward in the presence of a malicious agent is comparable with the cumulative team reward in the absence of the malicious agent. The claims have been proven theoretically, and the simulation confirms theoretical findings.


Keywords


Multi-agent system; Malicious agent; Consensus control; Reinforcement learning; Immediate reward; Cumulative reward

Full Text:

PDF

References


D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.

M. AlQuraishi, “Alphafold at casp13,” Bioinformatics, vol. 35, no. 22, pp. 4862–4865, 2019.

T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE Transactions on Cybernetics, 2020.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.

D. Zhang, X. Han, and C. Deng, “Review on the research and practice of deep learning and reinforcement learning in smart grids,” CSEE Journal of Power and Energy Systems, vol. 4, no. 3, pp. 362–370, 2018.

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017.

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017.

P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,” Autonomous Agents and MultiAgent Systems, vol. 33, no. 6, pp. 750–797, 2019.

R. Urena, F. Chiclana, G. Melancon, and E. Herrera-Viedma, “A social network based approach for consensus achievement in multiperson decision making,” Information Fusion, vol. 47, pp. 72–87, 2019.

R. Urena, G. Kou, Y. Dong, F. Chiclana, and E. Herrera-Viedma, “A review on trust propagation and opinion dynamics in social networks and group decision making frameworks,” Information Sciences, vol. 478, pp. 461–475, 2019.

G. De Pasquale and M. E. Valcher, “Consensus for clusters of agents with cooperative and antagonistic relationships,” Automatica, vol. 135, p. 110002, 2022.

D. Shen, C. Zhang, and J.-X. Xu, “Distributed learning consensus control based on neural networks for heterogeneous nonlinear multiagent systems,” International Journal of Robust and Nonlinear Control, vol. 29, no. 13, pp. 4328–4347, 2019.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT Press, 2018.

M. Figura, K. C. Kosaraju, and V. Gupta, “Adversarial attacks in consensus-based multi-agent reinforcement learning,” arXiv preprint arXiv:2103.06967, 2021.

A. Wang, T. Dong, and X. Liao, “Distributed optimal consensus algorithms in multi-agent systems,” Neurocomputing, vol. 339, pp. 26–35, 2019.

B. Mu and Y. Shi, “Distributed lqr consensus control for heterogeneous multiagent systems: Theory and experiments,” IEEE/ASME Transactions on Mechatronics, vol. 23, no. 1, pp. 434–443, 2018.

Z. Wang, J. Xu, X. Song, and H. Zhang, “Consensus problem in multiagent systems under delayed information,” Neurocomputing, vol. 316, pp. 277–283, 2018.

Y. Wang, Y. Song, D. J. Hill, and M. Krstic, “Prescribed-time consensus and containment control of networked multiagent systems,” IEEE Transactions on Cybernetics, vol. 49, no. 4, pp. 1138–1147, 2018.

B. Wang, W. Chen, and B. Zhang, “Semi-global robust tracking consensus for multi-agent uncertain systems with input saturation via metamorphic low-gain feedback,” Automatica, vol. 103, pp. 363–373, 2019.

L. Liu, H. Sun, L. Ma, J. Zhang, and Y. Bo, “Quasi-consensus control for a class of time-varying stochastic nonlinear time-delay multiagent systems subject to deception attacks,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 11, pp. 6863–6873, 2020.

F. Shamsi, H. A. Talebi, and F. Abdollahi, “Output consensus control of multi-agent systems with nonlinear non-minimum phase dynamics,” International Journal of Control, vol. 91, no. 4, pp. 785–796, 2018.

J. Zhang, H. Zhang, and T. Feng, “Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3339–3348, 2017.

L. Zha, J. Liu, and J. Cao, “Resilient event-triggered consensus control for nonlinear muti-agent systems with dos attacks,” Journal of the Franklin Institute, vol. 356, no. 13, pp. 7071–7090, 2019.

G. Cui, S. Xu, Q. Ma, Y. Li, and Z. Zhang, “Prescribed performance distributed consensus control for nonlinear multi-agent systems with unknown dead-zone input,” International Journal of Control, vol. 91, no. 5, pp. 1053–1065, 2018.

C. Gao, Z. Wang, X. He, and Q.-L. Han, “Consensus control of linear multiagent systems under actuator imperfection: When saturation meets fault,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021.

Z. Peng, J. Hu, K. Shi, R. Luo, R. Huang, B. K. Ghosh, and J. Huang, “A novel optimal bipartite consensus control scheme for unknown multiagent systems via model-free reinforcement learning,” Applied Mathematics and Computation, vol. 369, p. 124821, 2020.

R. Moghadam and H. Modares, “Resilient adaptive optimal control of distributed multi-agent systems using reinforcement learning,” IET Control Theory & Applications, vol. 12, no. 16, pp. 2165–2174, 2018.

K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Bas¸ar, “Fully decentralized multi-agent reinforcement learning with networked agents,” arXiv preprint arXiv:1802.08757v2, 2018.

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multiagent actor-critic for mixed cooperative-competitive environments,” arXiv preprint arXiv:1706.02275v4, 2020.

S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” arXiv preprint arXiv:1810.02912v2, 2019.

J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” arXiv preprint arXiv:1705.08926v2, 2017.

A. Nair, V. Pong, M. Dalal, S. Bahl, S. Lin, and S. Levine, “Visual reinforcement learning with imagined goals,” arXiv preprint arXiv:1807.04742v2, 2018.

Q.-K. Hu and Y.-P. Zhao, “Aero-engine acceleration control using deep reinforcement learning with phase-based reward function,” Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, p. 09544100211046225, 2021.

S. Zhou, Z. Hu, W. Gu, M. Jiang, M. Chen, Q. Hong, and C. Booth, “Combined heat and power system intelligent economic dispatch: A deep reinforcement learning approach,” International Journal of Electrical Power & Energy Systems, vol. 120, p. 106016, 2020.

K. C. Kosaraju, M. Figura, and V. Gupta, “Adversarial multi-agent reinforcement learning (adv-marl),” https://github.com/asokraju/adv-marl, 2020.




DOI: https://doi.org/10.18196/jrc.v3i2.13082

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Neshat Elhami Fard, Rastko Selmic

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 


Journal of Robotics and Control (JRC)

P-ISSN: 2715-5056 || E-ISSN: 2715-5072
Organized by Peneliti Teknologi Teknik Indonesia
Published by Universitas Muhammadiyah Yogyakarta in collaboration with Peneliti Teknologi Teknik Indonesia, Indonesia and the Department of Electrical Engineering
Website: http://journal.umy.ac.id/index.php/jrc
Email: jrcofumy@gmail.com


Kuliah Teknik Elektro Terbaik