Deep Q-Network-Based Path Planning in a Simulated Warehouse Environment with SLAM Map Integration and Dynamic Obstacles

Himandi Medagangoda; Nilusha Jayawickrama; Rajitha de Silva; U.U. Samantha Kumara Rajapaksha; Pradeep K.W. Abeygunawardhana

doi:10.18196/jrc.v6i5.27579

Authors

Himandi Medagangoda Curtin University https://orcid.org/0009-0009-0836-4507
Nilusha Jayawickrama Aalto University https://orcid.org/0000-0003-1188-5521
Rajitha de Silva University of Lincoln https://orcid.org/0000-0001-6404-1715
U.U. Samantha Kumara Rajapaksha Sri Lanka Institute of Information Technology https://orcid.org/0000-0001-9633-7254
Pradeep K.W. Abeygunawardhana Sri Lanka Institute of Information Technology https://orcid.org/0000-0001-9461-5433

DOI:

https://doi.org/10.18196/jrc.v6i5.27579

Keywords:

Deep Q-Networks, Path Planning, Simultaneous Localization and Mapping, Robot Operating System, Gazebo Simulation

Abstract

With the rise of e-Commerce and the evolution of robotic technologies, the focus on autonomous navigation within warehouse environments has increased. This study presents a simulation-based framework for path planning using Deep Q- Networks (DQN) in a warehouse environment modeled with moving obstacles. The proposed solution integrates a prebuilt map of the environment generated using Simultaneous Localization and Mapping (SLAM), which provides prior spatial knowledge of static obstacles. The reinforcement learning model is formulated with a state space derived from grayscale images that combine the static map generated by SLAM and dynamic obstacles in real time. The action space consists of four discrete movements for the agent. A reward shaping strategy includes a distance-based reward and penalty for collisions to encourage goal-reaching and discourage collisions. An epsilon-greedy policy with exponential decay is used to balance exploration and exploitation. This system was implemented in the Robot Operating System (ROS) and Gazebo simulation environment. The agent was trained over 1000 episodes and metrics such as the number of actions executed to reach the goal and the cumulative reward per episode were analyzed to evaluate the convergence of the proposed solution. The results across two goal locations show that incorporating the SLAM map enhances learning stability, with the agent reaching a goal approximately 150 times, nearly double the success rate compared to the baseline without map information, which achieved only 80 successful episodes over the same number of episodes. This indicates faster convergence and reduced exploration overhead due to improved spatial awareness.

References

Y. Li, R. Zhang, and D. Jiang, “Order-Picking Efficiency in E-Commerce Warehouses: A Literature Review,” Journal of Theoretical and Applied Electronic Commerce Research, vol. 17, no. 4, pp. 1812–1830, 2022, doi: 10.3390/jtaer17040091.

K. Ellithy, M. Salah, I. S. Fahim, and R. Shalaby, “AGV and Industry 4.0 in warehouses: a comprehensive analysis of existing literature and an innovative framework for flexible automation,” International Journal of Advanced Manufacturing Technology, vol. 134, no. 1–2, pp. 15–38, 2024, doi: 10.1007/s00170-024-14127-0.

A. A. Tubis and J. Rohman, “Intelligent Warehouse in Industry 4.0—Systematic Literature Review,” Sensors, vol. 23, no. 8, p. 4105, 2023, doi: 10.3390/s23084105.

A. R. Khairuddin, M. S. Talib, and H. Haron, “Review on simultaneous localization and mapping (SLAM),” in Proceedings - 5th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2015, pp. 85–90, 2016, doi: 10.1109/ICCSCE.2015.7482163.

A. Jarašūnienė, K. Čižiūnienė, and A. Čereška, “Research on Impact of IoT on Warehouse Management,” Sensors, vol. 23, no. 4, p. 2213, 2023, doi: 10.3390/s23042213.

J. T. Licardo, M. Domjan, and T. Orehovački, “Intelligent Robotics—A Systematic Review of Emerging Technologies and Trends,” Electronics (Switzerland), vol. 13, no. 3, p. 542, 2024, doi: 10.3390/electronics13030542.

F. Jacob, E. H. Grosse, S. Morana, and C. J. König, “Picking with a robot colleague: A systematic literature review and evaluation of technology acceptance in human–robot collaborative warehouses,” Computers and Industrial Engineering, vol. 180, p. 109262, 2023, doi: 10.1016/j.cie.2023.109262.

C. Scholz et al., “Sensor-enabled safety systems for human–robot collaboration: A review,” IEEE Sensors Journal, vol. 25, no. 1, pp. 65–88, 2024, doi: 10.1109/JSEN.2024.3496905

S. Ramesh, S. B. N, S. J. Sathyavarapu, V. Sharma, N. K. Nippun, and M. Khanna, “Comparative analysis of Q-learning, SARSA, and deep Q-network for microgrid energy management,” Scientific Reports, vol. 15, no. 1, p. 694, 2025, doi: 10.1038/s41598-024-83625-8.

C. Chen, J. Yu, and S. Qian, “An Enhanced Deep Q Network Algorithm for Localized Obstacle Avoidance in Indoor Robot Path Planning,” Applied Sciences (Switzerland), vol. 14, no. 23, p. 11195, 2024, doi: 10.3390/app142311195.

X. Zhang, “Improving exploration efficiency of deep reinforcement learning in sparse-reward environments,” Expert Systems with Applications, vol. 185, p. 115529, 2021.

S. F. Chik, C. F. Yeong, E. L. M. Su, T. Y. Lim, Y. Subramaniam, and P. J. H. Chin, “A review of social-aware navigation frameworks for service robot in dynamic human environments,” Journal of Telecommunication, Electronic and Computer Engineering, vol. 8, no. 11, pp. 41–50, 2016.

S. Alshammrei, S. Boubaker, and L. Kolsi, “Improved Dijkstra Algorithm for Mobile Robot Path Planning and Obstacle Avoidance,” Computers, Materials and Continua, vol. 72, no. 3, pp. 5939–5954, 2022, doi: 10.32604/cmc.2022.028165.

X. Zhou, J. Yan, M. Yan, K. Mao, R. Yang, and W. Liu, “Path Planning of Rail-Mounted Logistics Robots Based on the Improved Dijkstra Algorithm,” Applied Sciences (Switzerland), vol. 13, no. 17, p. 9955, 2023, doi: 10.3390/app13179955.

X. Xu, J. Zeng, Y. Zhao, and X. Lü, “Research on global path planning algorithm for mobile robots based on improved A*,” Expert Systems with Applications, vol. 243, pp. 2321–2334, 2024, doi: 10.1016/j.eswa.2023.122922.

Y. Bai, G. Li, and N. Li, “Motion Planning and Tracking Control of Autonomous Vehicle Based on Improved A∗ Algorithm,” Journal of Advanced Transportation, vol. 2022, no. 1, p. 1675736, 2022, doi: 10.1155/2022/1675736.

B. Fu et al., “An improved A* algorithm for the industrial robot path planning with high success rate and short length,” Robotics and Autonomous Systems, vol. 106, pp. 26–37, 2018, doi: 10.1016/j.robot.2018.04.007.

A. Chatzisavvas, M. Dossis, and M. Dasygenis, “Optimizing Mobile Robot Navigation Based on A-Star Algorithm for Obstacle Avoidance in Smart Agriculture,” Electronics (Switzerland), vol. 13, no. 11, p. 2057, 2024, doi: 10.3390/electronics13112057.

O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,” in Proceedings - IEEE International Conference on Robotics and Automation, pp. 500–505, 1985, doi: 10.1109/ROBOT.1985.1087247.

W. Zhang, N. Wang, and W. Wu, “A hybrid path planning algorithm considering AUV dynamic constraints based on improved A* algorithm and APF algorithm,” Ocean Engineering, vol. 285, p. 115333, 2023, doi: 10.1016/j.oceaneng.2023.115333.

J. Gao, X. Xu, Q. Pu, P. B. Petrovic, A. Rodic, and Z. Wang, “A Hybrid Path Planning Method Based on Improved A∗ and CSA-APF Algorithms,” IEEE Access, vol. 12, pp. 39139–39151, 2024, doi: 10.1109/ACCESS.2024.3372573.

D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics and Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997, doi: 10.1109/100.580977.

K. Qi, E. Li, and Y. Mao, “Dynamic Path Planning of Mobile Robot Based on Improved A* Algorithm and Adaptive DWA,” Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing, vol. 38, no. 2, pp. 451–467, 2023, doi: 10.16337/j.1004-9037.2023.02.019.

X. Bai, H. Jiang, J. Cui, K. Lu, P. Chen, and M. Zhang, “UAV Path Planning Based on Improved A ∗ and DWA Algorithms,” International Journal of Aerospace Engineering, vol. 2021, no. 1, p. 4511252, 2021, doi: 10.1155/2021/4511252.

W. Guan and K. Wang, “Autonomous Collision Avoidance of Unmanned Surface Vehicles Based on Improved A-Star and Dynamic Window Approach Algorithms,” IEEE Intelligent Transportation Systems Magazine, vol. 15, no. 3, pp. 36–50, 2023, doi: 10.1109/MITS.2022.3229109.

M. Naeem, S. T. H. Rizvi, and A. Coronato, “A Gentle Introduction to Reinforcement Learning and its Application in Different Fields,” IEEE Access, vol. 8, pp. 209320–209344, 2020, doi: 10.1109/ACCESS.2020.3038605.

M. Naeem, A. Coronato, Z. Ullah, S. Bashir, and G. Paragliola, “Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning,” Sensors, vol. 22, no. 21, p. 8278, 2022, doi: 10.3390/s22218278.

W. Kumwilaisak, S. Phikulngoen, J. Piriyataravet, N. Thatphithakkul, and C. Hansakunbuntheung, “Adaptive Call Center Workforce Management With Deep Neural Network and Reinforcement Learning,” IEEE Access, vol. 10, pp. 35712–35724, 2022, doi: 10.1109/ACCESS.2022.3160452.

A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, “The Frontiers of Deep Reinforcement Learning for Resource Management in Future Wireless HetNets: Techniques, Challenges, and Research Directions,” IEEE Open Journal of the Communications Society, vol. 3, pp. 322–365, 2022, doi: 10.1109/OJCOMS.2022.3153226.

Q. Zhou, Y. Lian, J. Wu, M. Zhu, H. Wang, and J. Cao, “An optimized Q-Learning algorithm for mobile robot local path planning,” Knowledge-Based Systems, vol. 286, p. 111400, 2024, doi: 10.1016/j.knosys.2024.111400.

Y. Lyu, A. Côme, Y. Zhang, and M. S. Talebi, “Scaling Up Q-Learning via Exploiting State–Action Equivalence,” Entropy, vol. 25, no. 4, p. 584, 2023, doi: 10.3390/e25040584.

N. Sutisna, A. M. R. Ilmy, I. Syafalni, R. Mulyawan, and T. Adiono, “FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoC,” IEEE Access, vol. 11, pp. 144–161, 2023, doi: 10.1109/ACCESS.2022.3232853.

V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015, doi: 10.1038/nature14236.

J. Li, Y. Chen, X. N. Zhao, and J. Huang, “An improved DQN path planning algorithm,” Journal of Supercomputing, vol. 78, no. 1, pp. 616–639, 2022, doi: 10.1007/s11227-021-03878-2.

B. Varga, B. Kulcsár, and M. H. Chehreghani, “Deep Q-learning: A robust control approach,” International Journal of Robust and Nonlinear Control, vol. 33, no. 1, pp. 526–544, 2023, doi: 10.1002/rnc.6457.

T. Nakamura, M. Kobayashi, and N. Motoi, “Path Planning for Mobile Robot Considering Turnabouts on Narrow Road by Deep Q-Network,” IEEE Access, vol. 11, pp. 19111–19121, 2023, doi: 10.1109/ACCESS.2023.3247730.

R. Singh, J. Ren, and X. Lin, “A Review of Deep Reinforcement Learning Algorithms for Mobile Robot Path Planning,” Vehicles, vol. 5, no. 4, pp. 1423–1451, 2023, doi: 10.3390/vehicles5040078.

X. Lei, Z. Zhang, and P. Dong, “Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning,” Journal of Robotics, vol. 2018, p. 10, 2018, doi: 10.1155/2018/5781591.

X. Zhang, X. Shi, Z. Zhang, Z. Wang, and L. Zhang, “A DDQN Path Planning Algorithm Based on Experience Classification and Multi Steps for Mobile Robots,” Electronics (Switzerland), vol. 11, no. 14, p. 2120, 2022, doi: 10.3390/electronics11142120.

A. Khlifi, M. Othmani, and M. Kherallah, “A Novel Approach to Autonomous Driving Using Double Deep Q-Network-Bsed Deep Reinforcement Learning,” World Electric Vehicle Journal, vol. 16, no. 3, 2025, doi: 10.3390/wevj16030138.

L. Chen, Q. Wang, C. Deng, B. Xie, X. Tuo, and G. Jiang, “Improved Double Deep Q-Network Algorithm Applied to Multi-Dimensional Environment Path Planning of Hexapod Robots,” Sensors, vol. 24, no. 7, p. 2061, 2024, doi: 10.3390/s24072061.

L. D. de Moraes et al., “Double Deep Reinforcement Learning Techniques for Low Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots,” in Lecture Notes in Networks and Systems, vol. 715 LNNS, pp. 156–165, 2023, doi: 10.1007/978-3-031-35507-3_16.

J. Cao et al., “Study on the Path Planning Algorithm Based on Dueling Deep Q Network,” in Journal of Physics: Conference Series, vol. 1920, no. 1, 2021, doi: 10.1088/1742-6596/1920/1/012084.

M. Gök, “Dynamic path planning via Dueling Double Deep Q-Network (D3QN) with prioritized experience replay,” Applied Soft Computing, vol. 158, p. 111503, 2024, doi: 10.1016/j.asoc.2024.111503.

D. A. Deguale, L. Yu, M. L. Sinishaw, and K. Li, “Enhancing Stability and Performance in Mobile Robot Path Planning with PMR-Dueling DQN Algorithm,” Sensors, vol. 24, no. 5, p. 1523, 2024, doi: 10.3390/s24051523.

W. Hu, Y. Zhou, and H. W. Ho, “Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay,” Electronics (Switzerland), vol. 13, no. 12, p. 2423, 2024, doi: 10.3390/electronics13122423.

J. Zhang, Z. Zhang, S. Han, and S. Lü, “Proximal policy optimization via enhanced exploration efficiency,” Information Sciences, vol. 609, pp. 750–765, 2022, doi: 10.1016/j.ins.2022.07.111.

C. C. Wong, K. D. Weng, and B. Y. Yu, “Multi-Robot Navigation System Design Based on Proximal Policy Optimization Algorithm,” Information (Switzerland), vol. 15, no. 9, p. 518, 2024, doi: 10.3390/info15090518.

H. Taheri, S. R. Hosseini, and M. A. Nekoui, “Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation,” arXiv preprint arXiv:2405.16266, 2024.

S. Wen, Y. Zhao, X. Yuan, Z. Wang, D. Zhang, and L. Manfredi, “Path planning for active SLAM based on deep reinforcement learning under unknown environments,” Intelligent Service Robotics, vol. 13, no. 2, pp. 263–272, 2020, doi: 10.1007/s11370-019-00310-w.

M. F. Ahmed, K. Masood, V. Fremont, and I. Fantoni, “Active SLAM: A Review on Last Decade,” Sensors, vol. 23, no. 19, p. 8097, 2023, doi: 10.3390/s23198097.

N. Botteghi, B. Sirmacek, R. Schulte, M. Poel, and C. Brune, “Reinforcement learning helps slam: Learning to build maps,” International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, vol. 43, no. B4, pp. 329–336, 2020, doi: 10.5194/isprs-archives-XLIII-B4-2020-329-2020.

Y. Huang, H. Zhao, and X. Peng, “A deep reinforcement learning-based framework for active SLAM in unknown environments,” Robotics, vol. 12, no. 1, p. 8, 2023, doi: 10.3390/robotics12010008

X. Zhang, X. Shi, Z. Zhang, Z. Wang, and L. Zhang, “A DDQN path planning algorithm based on experience classification and multi steps for mobile robots,” Electronics, vol. 11, no. 14, p. 2120, 2022.

M. F. R. Lee and S. H. Yusuf, “Mobile Robot Navigation Using Deep Reinforcement Learning,” Processes, vol. 10, no. 12, p. 2748, 2022, doi: 10.3390/pr10122748.

M. A. Chunab-Rodríguez, A. Santana-Díaz, J. Rodríguez-Arce, E. Sánchez-Tapia, and C. A. Balbuena-Campuzano, “A Free Simulation Environment Based on ROS for Teaching Autonomous Vehicle Navigation Algorithms,” Applied Sciences (Switzerland), vol. 12, no. 14, p. 7277, 2022, doi: 10.3390/app12147277.

R. Mengacci, G. Zambella, G. Grioli, D. Caporale, M. G. Catalano, and A. Bicchi, “An Open-Source ROS-Gazebo Toolbox for Simulating Robots With Compliant Actuators,” Frontiers in Robotics and AI, vol. 8, p. 713083, 2021, doi: 10.3389/frobt.2021.713083.

W. Han, “Robotics evaluation toolkits,” GitHub. 2024. [Online]. Available: https://github.com/wh200720041/warehouse_simulation_

toolkit/blob/master/README.md.

Y. Raoui and M. Amraoui, “Simultaneous Localization and Mapping of a Mobile Robot with Stereo Camera Using ORB Features,” Journal of Automation, Mobile Robotics and Intelligent Systems, vol. 18, no. 2, pp. 62–71, 2024, doi: 10.14313/jamris/2-2024/14.

J. Qiao, J. Guo, and Y. Li, “Simultaneous localization and mapping (SLAM)-based robot localization and navigation algorithm,” Applied Water Science, vol. 14, no. 7, p. 151, 2024, doi: 10.1007/s13201-024-02183-6.

F. M. Rico, J. M. G. Hernández, R. Pérez-Rodríguez, J. D. Peña-Narvaez, and A. G. Gómez-Jacinto, “Open source robot localization for nonplanar environments,” Journal of Field Robotics, vol. 41, no. 6, pp. 1922–1939, 2024, doi: 10.1002/rob.22353.

H. Zhu and Q. Luo, “Indoor Localization of Mobile Robots Based on the Fusion of an Improved AMCL Algorithm and a Collision Algorithm,” IEEE Access, vol. 12, pp. 67199–67208, 2024, doi: 10.1109/ACCESS.2024.3399192.

M. Peavy, P. Kim, H. Oyediran, and K. Kim, “Integration of Real-Time Semantic Building Map Updating with Adaptive Monte Carlo Localization (AMCL) for Robust Indoor Mobile Robot Localization,” Applied Sciences (Switzerland), vol. 13, no. 2, p. 909, 2023, doi: 10.3390/app13020909.

S. He, T. Song, P. Wang, C. Ding, and X. Wu, “An Enhanced Adaptive Monte Carlo Localization for Service Robots in Dynamic and Featureless Environments,” Journal of Intelligent and Robotic Systems: Theory and Applications, vol. 108, no. 1, 2023, doi: 10.1007/s10846-023-01858-7.

Y. Cao, K. Ni, T. Kawaguchi, and S. Hashimoto, “Path Following for Autonomous Mobile Robots with Deep Reinforcement Learning,” Sensors, vol. 24, no. 2, p. 561, 2024, doi: 10.3390/s24020561.

M. Usayiwevu, F. Sukkar, C. Yoo, R. Fitch, and T. Vidal-Calleja, “Continuous planning for inertial-aided systems,” Autonomous Robots, vol. 48, no. 8, p. 24, 2024, doi: 10.1007/s10514-024-10180-6.

X. Zhao, L. Wang, Y. Zhang, X. Han, M. Deveci, and M. Parmar, “A review of convolutional neural networks in computer vision,” Artificial Intelligence Review, vol. 57, no. 4, 2024, doi: 10.1007/s10462-024-10721-6.

G. Rangel, J. C. Cuevas-Tello, J. Nunez-Varela, C. Puente, and A. G. Silva-Trujillo, “A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition Tasks,” Journal of Sensors, vol. 2024, 2024, doi: 10.1155/2024/2797320.

A. Younesi, M. Ansari, M. Fazli, A. Ejlali, M. Shafique, and J. Henkel, "A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends," in IEEE Access, vol. 12, pp. 41180-41218, 2024, doi: 10.1109/ACCESS.2024.3376441.

D. E. Neves, L. Ishitani, and Z. K. G. do Patrocínio Júnior, “Advances and challenges in learning from experience replay,” Artificial Intelligence Review, vol. 58, no. 2, 2025, doi: 10.1007/s10462-024-11062-0.

H. Hassani, S. Nikan, and A. Shami, “Improved exploration–exploitation trade-off through adaptive prioritized experience replay,” Neurocomputing, vol. 614, p. 128836, 2025, doi: 10.1016/j.neucom.2024.128836.

Y. Yang, M. Xi, H. Dai, J. Wen, and J. Yang, “Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning,” Sensors, vol. 24, no. 23, p. 7746, 2024, doi: 10.3390/s24237746.

J. Ma, D. Ning, C. Zhang, and S. Liu, “Fresher Experience Plays a More Important Role in Prioritized Experience Replay,” Applied Sciences (Switzerland), vol. 12, no. 23, p. 12489, 2022, doi: 10.3390/app122312489.

C. Kim, “Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning,” Symmetry, vol. 15, no. 10, 2023, doi: 10.3390/sym15101840.

P. Wang, X. Li, C. Song, and S. Zhai, “Research on Dynamic Path Planning of Wheeled Robot Based on Deep Reinforcement Learning on the Slope Ground,” Journal of Robotics, vol. 2020, no. 1, p. 7167243, 2020, doi: 10.1155/2020/7167243.

C. Kim, “Temporal consistency-based loss function for both deep q-networks and deep deterministic policy gradients for continuous actions,” Symmetry, vol. 13, no. 12, p. 2411, 2021, doi: 10.3390/sym13122411.

S. Meyn, “The Projected Bellman Equation in Reinforcement Learning,” IEEE Transactions on Automatic Control, vol. 69, no. 12, pp. 8323–8337, 2024, doi: 10.1109/TAC.2024.3409647.

Z. Ben Hazem, “Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system,” Discover Applied Sciences, vol. 6, no. 2, 2024, doi: 10.1007/s42452-024-05690-y.

X. Xu, X. Li, N. Chen, D. Zhao, and C. Chen, “Autonomous Obstacle Avoidance with Improved Deep Reinforcement Learning Based on Dynamic Huber Loss,” Applied Sciences (Switzerland), vol. 15, no. 5, p. 2776, 2025, doi: 10.3390/app15052776.

S. Mishra and A. Arora, “A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem,” Neural Computing and Applications, vol. 35, no. 23, pp. 16705–16722, 2023, doi: 10.1007/s00521-022-07606-6.

S. Mishra and A. Arora, “Double Deep Q Network with Huber Reward Function for Cart-Pole Balancing Problem,” International Journal of Performability Engineering, vol. 18, no. 9, pp. 644–653, 2022, doi: 10.23940/ijpe.22.09.p5.644653.

M. Li, X. Gu, C. Zeng, and Y. Feng, “Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment,” Algorithms, vol. 13, no. 9, p. 239, Sep. 2020, doi: 10.3390/a13090239.

Y. Zhang, W. Zhao, J. Wang, and Y. Yuan, “Recent progress, challenges and future prospects of applied deep reinforcement learning: A practical perspective in path planning,” Neurocomputing, vol. 608, p. 128423, 2024.

N. Khlif, K. Nahla, and B. Safya, “Reinforcement learning with modified exploration strategy for mobile robot path planning,” Robotica, vol. 41, no. 9, pp. 2688–2702, 2023, doi: 10.1017/S0263574723000607.

Y. Yin, Z. Chen, G. Liu, and J. Guo, “A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework,” Sensors, vol. 23, no. 4, p. 2036, 2023, doi: 10.3390/s23042036.

A. de J. Plasencia-Salgueiro, “Deep Reinforcement Learning for Autonomous Mobile Robot Navigation,” Studies in Computational Intelligence, vol. 1093, no. 17, pp. 195–237, 2023, doi: 10.1007/978-3-031-28715-2_7.

E. Sayar, G. Iacca, and A. Knoll, “Curriculum Learning for Robot Manipulation Tasks With Sparse Reward Through Environment Shifts,” IEEE Access, vol. 12, pp. 46626–46635, 2024, doi: 10.1109/ACCESS.2024.3382264.

K. Or et al., “Curriculum-reinforcement learning on simulation platform of tendon-driven high-degree of freedom underactuated manipulator,” Frontiers in Robotics and AI, vol. 10, p. 1066518, 2023, doi: 10.3389/frobt.2023.1066518.

F. J. Mañas-Álvarez, M. Guinaldo, R. Dormido, and S. Dormido-Canto, “Scalability of Cyber-Physical Systems with Real and Virtual Robots in ROS 2,” Sensors, vol. 23, no. 13, p. 6073, 2023, doi: 10.3390/s23136073.

J. Platt and K. Ricks, “Comparative Analysis of ROS-Unity3D and ROS-Gazebo for Mobile Ground Robot Simulation,” Journal of Intelligent and Robotic Systems: Theory and Applications, vol. 106, no. 4, p. 80, 2022, doi: 10.1007/s10846-022-01766-2.