Formation Control of Multiple Unmanned Aerial Vehicle Systems using Integral Reinforcement Learning

Ngoc Trung Dang, Quynh Nga Duong

Abstract


Formation control of Unmanned Aerial Vehicles (UAVs), especially quadrotors, has many practical applications in contour mapping, transporting, search and rescue. This article solves the formation tracking requirement of a group of multiple UAVs by formation control design in outer loop and integral Reinforcement Learning (RL) algorithms in position sub-system. First, we present the formation tracking control structure, which uses a cascade description to account for the model separation of each UAV. Second, based on value function of inner model, a modified iteration algorithm is given to obtain the optimal controller in the presence of discount factor, which is necessary to employ due to the finite requirement of infinite horizon based cost function. Third, the integral RL control is developed to handle dynamic uncertainties of attitude sub-systems in formation UAV control scheme with a discount factor to be employed in infinite horizon based cost function. Specifically, the advantage of the proposed control is pointed out in not only formation tracking problem but also in the optimality effectiveness. Finally, the simulation results are conducted to validate the proposed formation tracking control of a group of multiple UAV system.

Keywords


Integral Reinforcement Learning (RL); Unmanned Aerial Vehicles (UAVs); Formation Control; Approximate/Adaptive Dynamic Programming (ADP); Model-Free Based Control.

Full Text:

PDF

References


X. Wang, Y. Yu, and Z. Li, “Distributed sliding mode control for leader‐follower formation flight of fixed‐wing unmanned aerial vehicles subject to velocity constraints,” International Journal of Robust and Nonlinear Control, vol. 31, no. 6, pp. 2110–2125, June 2020, doi: 10.1002/rnc.5030.

J. Wang, C. Bi, D. Wang, Q. Kuang, and C. Wang, “Finite-time distributed event-triggered formation control for quadrotor UAVs with experimentation,” ISA transactions, vol. 126, pp. 585–596, 2022, doi: 10.1016/j.isatra.2021.07.049.

O. Elhaki and K. Shojaei, “A novel model-free robust saturated reinforcement learning-based controller for quadrotors guaranteeing prescribed transient and steady state performance,” Aerospace Science and Technology, vol. 119, pp. 107–128, 2021, doi: 10.1016/j.ast.2021.107128.

T. Yang, P. Zhang, and H. Chen, “Distributed prescribed-time leader-follower formation control of surface vehicles with unknowns and input saturation,” ISA Transactions, vol. 134, pp. 16–27, 2023, doi: 10.1016/j.isatra.2022.07.033.

D. Liu, Z. Liu, C. P. Chen, and Y. Zhang, “Neural Network-based Smooth Fixed-Time Cooperative Control of High-Order Multi-Agent Systems With Time-Varying Failures,” Journal of the Franklin Institute, no. 16, p. 8553–8578, 2022, doi: 10.1016/j.jfranklin.2022.08.058.

H. Liu, P. Weng, X. Tian, and Q. Mai, “Distributed adaptive fixed-time formation control for UAV-USV heterogeneous multi-agent systems,” Ocean Engineering, vol. 267, no. 4, pp. 113–240, 2023, doi: 10.1016/j.oceaneng.2022.113240.

H. Xu, G. Cui, Q. Ma, Z. Li, and W. Hao, “Fixed-Time Disturbance Observer-Based Distributed Formation Control for Multiple QUAVs,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 6, pp. 2181 – 2185, 2023, doi: 10.1109/TCSII.2022.3233438.

Y. Shi, X. Dong, Y. Hua, J. Yu, and Z. Ren, “Distributed output formation tracking control of heterogeneous multi-agent systems using reinforcement learning,” ISA transactions, vol. 138, pp. 318–328, 2023, doi: 10.1016/j.isatra.2023.03.003.

T. Sun, C. Liu, and X. Wang, “Distributed anti-windup NN-sliding mode formation control of multi-ships with minimum cost,” ISA transactions, vol. 138, pp. 49-62, 2023, doi: 10.1016/j.isatra.2023.03.011.

Z. Hu and X. Jin, "Formation control for an UAV team with environment-aware dynamic constraints," IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1465 – 1480, 2023, doi: 10.1109/TIV.2023.3295354.

T. H. Nguyen, D. Q. Bui, and P. N. Dao, "An efficient Min/Max Robust Model Predictive Control for nonlinear discrete-time systems with dynamic disturbance," Chaos, Solitons & Fractals, vol. 180, p. 114551, March 2024, doi: 10.1016/j.chaos.2024.114551.

J. Wang, J. Liu, Y. Zheng, and D. Zhang, “Data-based L2gain optimal control for discrete-time system with unknown dynamics,” Journal of the Franklin Institute, vol. 360, no. 6, pp. 4354–4377, 2023, doi: 10.1016/j.jfranklin.2023.02.030.

T. L. Pham and P. N. Dao, "Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels," ISA transactions, vol. 130, pp. 277–292, 2022, doi: 10.1016/j.isatra.2022.03.027.

M. Liu, Q. Cai, D. Li, W. Meng, and M. Fu, “Output feedback Q-learning for discrete-time finite-horizon zero-sum games with application to the H∞ control,” Neurocomputing, vol. 529, pp. 48–55, 2023, doi: 10.1016/j.neucom.2023.01.050.

L. Chen and F. Hao, “Optimal tracking control for unknown nonlinear systems with uncertain input saturation: A dynamic event-triggered ADP algorithm,” Neurocomputing, vol. 564, no. 2, 2024, doi: 10.1016/j.neucom.2023.126964.

X. Cui, B. Peng, B. Wang, and L. Wang, “Event‐triggered neural experience replay learning for nonzero‐sum tracking games of unknown continuous‐time nonlinear systems,” International Journal of Robust and Nonlinear Control, vol. 35, no. 12, pp. 6553–6575, 2023, doi: 10.1002/rnc.6709.

H. Nguyen, H. B. Dang, and P. N. Dao, “On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system,” Aerospace Science and Technology, vol. 146, no. 10, p. 108972, 2024, doi: 10.1016/j.ast.2024.108972.

Z. Ming, H. Zhang, J. Zhang, and X. Xie, “A novel actor–critic–identifier architecture for nonlinear multiagent systems with gradient descent method,” Automatica, vol. 155, pp. 111–128, 2023, doi: 10.1016/j.automatica.2023.111128.

Y. Jiang, W. Gao, J. Wu, T. Chai, and F. L. Lewis, “Reinforcement learning and cooperative H∞ output regulation of linear continuous-time multi-agent systems,” Automatica, vol. 148, p. 110768, 2023, doi: 10.1016/j.automatica.2022.110768.

B. Lian, W. Xue, Y. Xie, F. L. Lewis, and A. Davoudi, “Off-policy inverse Q-learning for discrete-time antagonistic unknown systems,” Automatica, vol. 155, pp. 111–171, 2023, doi: 10.1016/j.automatica.2023.111171.

K. Nguyen, V. T. Dang, D. D. Pham, and P. N. Dao, “Formation control scheme with reinforcement learning strategy for a group of multiple surface vehicles,” International Journal of Robust and Nonlinear Control, vol. 34, no. 3, pp. 2252-2279, 2024, doi: 10.1002/rnc.7083.

P. N. Dao and H. A. N. Duc, "Nonlinear RISE based integral reinforcement learning algorithms for perturbed Bilateral Teleoperators with variable time delay," Neurocomputing, vol. 605, p. 128355, 2024, doi: 10.1016/j.neucom.2024.128355.

T. T. Cao, Q. H. Dao, and P. N. Dao, "Q-Learning Algorithm for F16 Aircraft: An approach to Discrete-Time Systems," 2023 12th International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 1-4, 2023, doi: 10.1109/ICCAIS59597.2023.10382319.

X. K. Tran, D. T. Le, D. T. Tran, and P. N. Dao, "A Reinforcement Learning Method for Control Scheme of Permanent Magnet Synchronous Motor," International Conference on Advances in Information and Communication Technology, pp. 121-126, 2023, doi: 10.1007/978-3-031-50818-9_15.

D. T. T. Thi, H. L. Thi, P. N. Dao, and T. L. Nguyen, "Online Critic Dynamic Programming Based Optimal Control for Three-Shaft Web Machine Systems," 2023 Asia Meeting on Environment and Electrical Engineering (EEE-AM), pp. 1-5, 2023, doi: 10.1109/EEE-AM58328.2023.10394730.

T. T. Cao, V. Q. Nguyen, H. A. N. Duc, Q. P. Nguyen, and P. N. Dao, "Policy Iteration-Output Feedback Adaptive Dynamic Programming Tracking Control for a Two-Wheeled Self Balancing Robot," The International Conference on Intelligent Systems & Networks, pp. 603-609, 2023, doi: 10.1007/978-981-99-4725-6_71.

T. T. Cao, M. H. Vu, V. C. Nguyen, T. A. Nguyen, and P. N. Dao, "Formation Control Scheme of Multiple Surface Vessels with Model Predictive Technique," The International Conference on Intelligent Systems & Networks, pp. 312-317, 2023, doi: 10.1007/978-981-99-4725-6_39.

Y. Dong and J. Chen "Nonlinear observer-based approach for cooperative control of networked rigid spacecraft systems," Automatica, vol. 128, p. 109552, 2021, doi: 10.1016/j.automatica.2021.109552.

Z. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” International Journal of Robust and Nonlinear Control, vol. 31, no. 6, pp. 1923-1940, 2021, doi: 10.1002/rnc.5132.

S. Zhang, B. Zhao, D. Liu, and Y. Zhang, "Observer-based event-triggered control for zero-sum games of input constrained multi-player nonlinear systems," Neural Networks, vol. 144, pp. 101-112, 2021, doi: 10.1016/j.neunet.2021.08.012.

T. Wang, Y. Wang, X. Yang, and J. Yang, "Further results on optimal tracking control for nonlinear systems with nonzero equilibrium via adaptive dynamic programming," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 1900-1910, 2021, doi: 10.1109/TNNLS.2021.3105646.

Y. Ouyang, L. Xue, L. Dong, and C. Sun, "Neural network-based finite-time distributed formation-containment control of two-layer quadrotor UAVs," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 8, pp. 4836-4848, 2021, doi: 10.1109/TSMC.2021.3103013.

D. Yu, J. Long, C. P. Chen, and Z. Wang, "Adaptive swarm control within saturated input based on nonlinear coupling degree," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 8, pp. 4900-4911, 2021, doi: 10.1109/TSMC.2021.3102587.

M. Wu, L. Liu, and Z. Yu, "Augmented safety guarantee-based area keeping control for an underactuated USV with environmental disturbances," ISA transactions, vol. 127, pp. 415-422, 2022.

Y. Ouyang, C. Sun, and L Dong, "Actor–critic learning based coordinated control for a dual-arm robot with prescribed performance and unknown backlash-like hysteresis," ISA transactions, vol. 126, pp. 1-13, 2022.

D. Wang, M. Zhao, M. Ha, and L. Hu, "Adaptive-critic-based hybrid intelligent optimal tracking for a class of nonlinear discrete-time systems," Engineering Applications of Artificial Intelligence, vol. 105, p. 104443, 2021.

Y. Zhang, B. Zhao, D. Liu, and S. Zhang, "Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 8, pp. 4823-4835, 2021.

K. Shao, R. Tang, F. Xu, X. Wang, and J. Zheng, "Adaptive sliding mode control for uncertain Euler–Lagrange systems with input saturation," Journal of the Franklin Institute, vol. 358, no. 16, pp. 8356-8376, 2021, doi: 10.1016/j.jfranklin.2021.08.027.

G. Wen and C. P. Chen, "Optimized backstepping consensus control using reinforcement learning for a class of nonlinear strict-feedback-dynamic multi-agent systems," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 3, 1524-1536, 2021.

X. Z. Zhang, J. Lan, Y.‐J. Liu, and L. Liu, "Adaptive event‐triggered control of multi‐agent systems with state constraints and unknown disturbances," IET Control Theory & Applications, vol. 15, no. 17, pp. 2171-2182, 2021, doi: 10.1049/cth2.12183.

Y. Xu, T. Li, W. Bai, Q. Shan, L. Yuan, and Y. Wu, "Online event-triggered optimal control for multi-agent systems using simplified ADP and experience replay technique," Nonlinear Dynamics, vol. 106, no. 1, pp. 509-522, 2021, doi: 10.1007/s11071-021-06816-2.

Jin, Xin, Yang Shi, Yang Tang, Herbert Werner, and Jürgen Kurths. "Event-triggered fixed-time attitude consensus with fixed and switching topologies." IEEE transactions on automatic control, vol. 67, no. 8, pp. 4138-4145, 2021, DOI: 10.1109/TAC.2021.3108514.

S. Wang, S. Wen, K. Shi, X. Zhou, and T. Huang, "Approximate optimal control for nonlinear systems with periodic event-triggered mechanism," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 6, pp. 2722-2731, 2021.

Y. Yang, H. Modares, K. G. Vamvoudakis, W. He, C.-Z. Xu, and D. C. Wunsch, "Hamiltonian-driven adaptive dynamic programming with approximation errors," IEEE Transactions on Cybernetics, vol. 52, no. 12, pp. 13762-13773, 2022, doi: 10.1109/TCYB.2021.3108034.

W. Sun, S. Diao, S.-F. Su, and Z.-Y. Sun, "Fixed-time adaptive neural network control for nonlinear systems with input saturation," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 1911-1920, 2023, doi: 10.1109/TNNLS.2021.3105664.

Y. Cheng, B. Xu, Z. Lian, Z. Shi, and P. Shi, "Adaptive learning control of switched strict-feedback nonlinear systems with dead zone using NN and DOB," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 5, pp. 2503-2512, 2023.

S. A. A. Rizvi and Z. Lin, "An iterative Q‐learning scheme for the global stabilization of discrete‐time linear systems subject to actuator saturation," International Journal of Robust and Nonlinear Control, vol. 29, no. 9, pp. 2660-2672, 2019, doi: 10.1002/rnc.4514.

Y.-Y. Liu, Z‐S. Wang, and Z. Shi, "H∞ tracking control for linear discrete‐time systems via reinforcement learning," International Journal of Robust and Nonlinear Control, vol. 30, no. 1, pp. 282-301, 2020, doi: 10.1002/rnc.4762.

Z. Shi and Z. Wang, "Adaptive output-feedback optimal control for continuous-time linear systems based on adaptive dynamic programming approach," Neurocomputing, vol. 438, pp. 334-344, 2021, doi: 10.1016/j.neucom.2021.01.070.

M. Long, S. Housheng, and Z. Zeng, "Output-feedback global consensus of discrete-time multiagent systems subject to input saturation via Q-learning method," IEEE Transactions on Cybernetics, vol. 52, no. 3, pp. 1661-1670, 2022.

Y. Liu and H. Su, "General second-order consensus of discrete-time multiagent systems via Q-learning method," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 3, pp. 1417-1425, 2022, doi: 10.1109/TSMC.2020.3019519.

Y. Liu, H. Su, and Z. Zeng, "Second-order consensus for multiagent systems with switched dynamics," IEEE Transactions on Cybernetics, vol. 52, no. 6, pp. 4105-4114, 2022.

H. Su, Q. Qiu, X. Chen, and Z. Zeng, "Distributed adaptive containment control for coupled reaction-diffusion neural networks with directed topology," IEEE Transactions on Cybernetics, vol. 52, no. 7, pp. 6320-6330, 2022, DOI: 10.1109/TCYB.2020.3034634.

X. Wang, G.-P. Jiang, H. Su, and Z. Zeng, "Consensus of continuous-time linear multiagent systems with discrete measurements," IEEE Transactions on Cybernetics, vol. 52, no. 5, pp. 3196-3206, 2022.

Y. Yang, Z. Ding, R. Wang, H. Modares, and D. C. Wunsch, "Data-driven human-robot interaction without velocity measurement using off-policy reinforcement learning," IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 1, pp. 47-63, 2021.

W. Xue, B. Lian, J. Fan, P. Kolaric, T. Chai, and F. L. Lewis, "Inverse reinforcement Q-learning through expert imitation for discrete-time systems," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 5, pp. 2386-2399, 2023.

Z. Pan, C. Zhang, Y. Xia, H. Xiong, and X. Shao, "An improved artificial potential field method for path planning and formation control of the multi-UAV systems," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 3, pp. 1129-1133, 2021.

J. Sun, H. Zhang, Y. Yan, S. Xu, and X. Fan, "Optimal regulation strategy for nonzero-sum games of the immune system using adaptive dynamic programming," IEEE Transactions on cybernetics, vol. 53, no. 3, pp. 1475-1484, 2023, doi: 10.1109/TCYB.2021.3103820.

G. Wen, W. Hao, W. Feng, and K. Gao, "Optimized backstepping tracking control using reinforcement learning for quadrotor unmanned aerial vehicle system," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 8, pp. 5004-5015, 2022.

Z. Zhang, C. Wen, L. Xing, and Y. Song, "Adaptive event-triggered control of uncertain nonlinear systems using intermittent output only," IEEE Transactions on Automatic Control, vol. 67, no. 8, pp. 4218-4225, 2022, doi: 10.1109/TAC.2021.3115435.

J. Song, D. Shi, Y. Shi, and J. Wang, "Proportional-integral event-triggered control of networked systems with unmatched uncertainties," IEEE Transactions on Industrial Electronics, vol. 69, no. 9, pp. 9320-9330, 2022, doi: 10.1109/TIE.2021.3112997.

S. A. Arogeti and F. L. Lewis, "Static output-feedback H∞ control design procedures for continuous-time systems with different levels of model knowledge," IEEE Transactions on Cybernetics, vol. 53, no. 3, pp. 1432-1446, 2023, doi: 10.1109/TCYB.2021.3103148.

Q. Wei, J. Lu, T. Zhou, X. Cheng, and F.-Y. Wang, "Event-triggered near-optimal control of discrete-time constrained nonlinear systems with application to a boiler-turbine system," IEEE Transactions on Industrial Informatics, vol. 18, no. 6, pp. 3926-3935, 2022.

R. Moghadam and S. Jagannathan, "Optimal adaptive control of uncertain nonlinear continuous-time systems with input and state delays," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 6, pp. 3195-3204, 2023.

M. Han, Y. Tian, L. Zhang, J. Wang, and W. Pan, "Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee," Automatica, vol. 129, p. 109689, 2021, doi: 10.1016/j.automatica.2021.109689.

C.-J. Ong and B. Hou, "Consensus of heterogeneous multi-agent system with input constraints," Automatica, vol. 134, p. 109895, 2021.

P. Deptula, Z. I. Bell, F. M. Zegers, R. A. Licitra, and W. E. Dixon, "Approximate optimal influence over an agent through an uncertain interaction dynamic," Automatica, vol. 134, p. 109913, 2021.

W. Wang, J. Long, J. Zhou, J. Huang, and C. Wen, "Adaptive backstepping based consensus tracking of uncertain nonlinear systems with event-triggered communication," Automatica, vol. 133, 109841, 2021, doi: 10.1016/j.automatica.2021.109841.

S. Li, P. Durdevic, and Z. Yang, "Model-free H∞ tracking control for de-oiling hydrocyclone systems via off-policy reinforcement learning," Automatica, vol. 133, p. 109862, 2021.

H. Zhao, X. Meng, and S. Wu, "Distributed edge-based event-triggered coordination control for multi-agent systems," Automatica, vol. 132, p. 109797, 2021, doi: 10.1016/j.automatica.2021.109797.

H. Wang, W. Ren, W. Yu, and D. Zhang, "Fully distributed consensus control for a class of disturbed second-order multi-agent systems with directed networks," Automatica, vol. 132, p. 109816, 2021.

V. Rezaei and M. Stefanovic, "Event-triggered cooperative stabilization of multiagent systems with partially unknown interconnected dynamics," Automatica, vol. 130, p. 109657, 2021.

P. Li, F. Jabbari, and X.-M. Sun, "Containment control of multi-agent systems with input saturation and unknown leader inputs," Automatica, vol. 130, p. 109677, 2021, doi: 10.1016/j.automatica.2021.109677.

B. Wang, H. Ashrafiuon, and S. Nersesov, "Leader–follower formation stabilization and tracking control for heterogeneous planar underactuated vehicle networks," Systems & Control Letters, vol. 156, p. 105008, 2021, doi: 10.1016/j.sysconle.2021.105008.

A. Scampicchio, A. Aravkin, and G. Pillonetto, "Stable and robust LQR design via scenario approach," Automatica, vol. 129, p. 109571, 2021.

J. Sun and T. Long, "Event-triggered distributed zero-sum differential game for nonlinear multi-agent systems using adaptive dynamic programming," ISA transactions, vol. 110, pp. 39-52, 2021.

C. Li, J. Ding, F. L. Lewis, and T. Chai, "A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems," Automatica, vol. 129, p. 109687, 2021.

F. Mazenc, M. Malisoff, C. Barbalata, and Z.-P. Jiang, "Event-triggered control using a positive systems approach," European Journal of Control, vol. 62, pp. 63-68, 2021.

G. Wang, C. Wang, Y. Ji, and Q. Li, "Practical output consensus control of uncertain nonlinear multi‐agent systems without using the higher‐order states of neighbours," IET Control Theory & Applications, vol. 15, no. 8, pp. 1091-1103, 2021, doi: 10.1049/cth2.12106.

B. Yan, C. Wu, and P. Shi, "Formation consensus for discrete-time heterogeneous multi-agent systems with link failures and actuator/sensor faults," Journal of the Franklin Institute, vol. 356, no. 12, pp. 6547-6570, 2019, doi: 10.1016/j.jfranklin.2019.03.028.




DOI: https://doi.org/10.18196/jrc.v5i6.23505

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Ngoc Trung Dang, Quynh Nga Duong

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 


Journal of Robotics and Control (JRC)

P-ISSN: 2715-5056 || E-ISSN: 2715-5072
Organized by Peneliti Teknologi Teknik Indonesia
Published by Universitas Muhammadiyah Yogyakarta in collaboration with Peneliti Teknologi Teknik Indonesia, Indonesia and the Department of Electrical Engineering
Website: http://journal.umy.ac.id/index.php/jrc
Email: jrcofumy@gmail.com


Kuliah Teknik Elektro Terbaik