Visual Slam and Visual Odometry Based on RGB-D Images Using Deep Learning: A Survey

Van-Hung Le

Abstract


Visual simultaneous localization and mapping (Visual SLAM) based on RGB-D images includes two main tasks: building an environment map and simultaneously tracking the location/motion trajectory of the image sensor, or called visual odometry (VO). Visual SLAM and VO are used in many applications as robot systems, autonomous mobile robots, supporting systems for the blind, human-machine interaction, industry, etc. With the strong development of deep learning (DL), it has been applied and brought impressive results when building Visual SLAM and VO from image sensor data (RGB-D images). To get the overall picture of the development of DL applied to building Visual SLAM and VO systems. At the same time, the results, challenges, and advantages of DL models to solve Visual SLAM and VO problems. In this paper, we proposed the taxonomy to conduct a complete survey based on three methods from RGB-D images: (1) using DL for the modules (depth estimation, optical flow estimation, visual odometry, mapping, and loop closure detection) of the Visual SLAM and VO framework; (2) using DL modules to supplement (feature extraction, semantic segmentation, pose estimation, map construction, loop closure detection, others module) to Visual SLAM and VO framework; (3) using end-toend DL to build Visual SLAM and VO systems. The studies were surveyed based on the order of methods, datasets, and evaluation measures, the detailed results according to datasets are also presented. In particular, the challenges of studies using DL to build Visual SLAM and VO systems are also analyzed and some of our further studies are also introduced.


Keywords


Visual Slam; Visual Odometry (VO); Deep Learning (Dl); RGB-D Images; 3d Point Cloud Scene; Camera Pose; Trajectories Motion.

Full Text:

PDF

References


J. Zhao and S. Liu, “Research and Implementation of Autonomous Navigation for Mobile Robots Based on SLAM Algorithm under ROS,” Sensors, vol. 22, 2022, doi: 10.3390/s22114172.

P. V. Dhayanithi and T. Mohanraj, “Ros-based evaluation of slam algorithms and autonomous navigation for a mecanum wheeled robot,” in Intelligent Control, Robotics, and Industrial Automation, pp. 13–24, 2023, doi: 10.1007/978-981-99-4634-1_2.

P. T. Karfakis and M. S. Couceiro, “NR5G-SAM : A SLAM Framework for Field Robot Applications Based on 5G New Radio,” Sensors, vol. 23, pp. 1–33, 2023, doi: 10.3390/s23115354.

S. Zheng, J. Wang, C. Rizos, W. Ding, and A. El-mowafy, “Simultaneous Localization and Mapping ( SLAM ) for Autonomous Driving : Concept and Analysis,” Remote Sensing, vol. 15 no. 4, pp. 1–41, 2023, doi: 10.3390/rs15041156.

M. Bamdad, D. Scaramuzza, and A. Darvishy, “SLAM for Visually Impaired People : a Survey,” IEEE Access, vol. 11, pp. 1–45, 2023, doi: 10.48550/arXiv.2212.04745.

W. Kontar, S. Ahn, Y. Liu, M. Tight, and Y. Gong, “Research on Comparison of LiDAR and Camera in Autonomous Driving Research on Comparison of LiDAR and Camera in Autonomous Driving,” Journal of Physics: Conference Series, vol. 2093, 2021, doi: 10.1088/1742-6596/2093/1/012032.

T. Taketomi, H. Uchiyama, and S. Ikeda, “Visual SLAM algorithms: A survey from 2010 to 2016,” IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 6, 2017, doi: 10.1186/s41074-017-0027-2.

L. Jinyu, Y. Bangbang, C. Danpeng, W. Nan, Z. Guofeng, and B. Hujun, “Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality,” Virtual Reality and Intelligent Hardware, vol. 1, no. 4, pp. 386–410, 2019, doi: 10.1016/j.vrih.2019.07.002.

D. Lai, Y. Zhang and C. Li, “A Survey of Deep Learning Application in Dynamic Visual SLAM,” 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp. 279-283, 2020, doi: 10.1109/ICBASE51474.2020.00065.

R. Azzam, T. Taha, S. Huang, and Y. Zweiri, “Feature-based visual simultaneous localization and mapping: a survey,” SN Applied Sciences, vol. 2, no. 2, pp. 1–24, 2020, doi: 10.1007/s42452-020-2001-3.

L. Xia, J. Cui, R. Shen, X. Xu, Y. Gao, and X. Li, “A survey of image semantics-based visual simultaneous localization and mapping: Application-oriented solutions to autonomous navigation of mobile robots,” International Journal of Advanced Robotic Systems, vol. 17, no. 3, pp. 1–17, 2020, doi: 10.1177/1729881420919185.

B. Fang, G. Mei, X. Yuan, L. Wang, Z. Wang, and J. Wang, “Visual SLAM for robot navigation in healthcare facility,” Pattern Recognition, vol. 113, 2021, doi: 10.1016/j.patcog.2021.107822.

A. M. Barros, M. Michel, Y. Moline, G. Corre, and F. Carrel, “A Comprehensive Survey of Visual SLAM Algorithms,” Robotics, vol. 11, no. 1, 2022, doi: 10.3390/robotics11010024.

I. Abaspur Kazerouni, L. Fitzgerald, G. Dooly, and D. Toal, “A survey of state-of-the-art on visual SLAM,” Expert Systems with Applications, vol. 205, 2022, doi: 10.1016/j.eswa.2022.117734.

J. Qin, M. Li, D. Li, J. Zhong, and K. Yang, “A Survey on Visual Navigation and Positioning for Autonomous UUVs” Remote Sensing, vol. 14, no. 15, 2022, doi: 10.3390/rs14153794.

Z. Zhang and J. Zeng, “A Survey on Visual Simultaneously Localization and Mapping,” Frontiers in Computing and Intelligent Systems, vol. 1, no. 1, pp. 18–21, 2022, doi: 10.54097/fcis.v1i1.1089.

K. A. Tsintotas, L. Bampis and A. Gasteratos, “The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection,” in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 19929-19953, 2022, doi: 10.1109/TITS.2022.3175656.

K. Chen, J. Zhang, J. Liu, Q. Tong, R. Liu, and S. Chen, “Semantic Visual Simultaneous Localization and Mapping: A Survey,” arXiv, vol. 14, no. 8, pp. 1–14, 2022, doi: 10.48550/arXiv.2209.06428.

Y. Tian, H. Yue, B. Yang, and J. Ren, “Unmanned Aerial Vehicle Visual Simultaneous Localization and Mapping: A Survey,” Journal of Physics: Conference Series, vol. 2278, no. 1, 2022, doi: 10.1088/1742- 6596/2278/1/012006.

A. Tourani, H. Bavley, Jose-Luis, Sanchez-Lopezz, and H. Voos, “Visual SLAM: What are the Current Trends and What to Expect?,” sensors, vol. 22, no. 23, 2022, doi: 10.3390/s22239297.

Y. Dai, J. Wu, and D. Wang, “A Review of Common Techniques for Visual Simultaneous Localization and Mapping,” Journal of Robotics, vol. 2023, 2023, doi: 10.1155/2023/8872822.

S. Mokssit, D. B. Licea, B. Guermah and M. Ghogho, “Deep Learning Techniques for Visual SLAM: A Survey,” in IEEE Access, vol. 11, pp. 20026-20050, 2023, doi: 10.1109/ACCESS.2023.3249661.

M. N. Favorskaya, “Deep Learning for Visual SLAM: The State-ofthe-Art and Future Trends,” Electronics, vol. 12, no. 9, 2023, doi: 10.3390/electronics12092006.

L. R. Agostinho, N. M. Ricardo, M. I. Pereira, A. Hiolle and A. M. Pinto, “A Practical Survey on Visual Odometry for Autonomous Driving in Challenging Scenarios and Conditions,” in IEEE Access, vol. 10, pp. 72182-72205, 2022, doi: 10.1109/ACCESS.2022.3188990.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research,” The International Journal of Robotics Research, pp. 1–6, 2013.

M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The EuRoC micro aerial vehicle datasets,” International Journal of Robotics Research, vol. 35, no. 10, pp. 1157– 1163, 2016, doi: 10.1177/0278364915620033.

D. Schubert, T. Goll, N. Demmel, V. Usenko, J. Stuckler and D. Cremers, ¨ “The TUM VI Benchmark for Evaluating Visual-Inertial Odometry,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1680-1687, 2018, doi: 10.1109/IROS.2018.8593419.

S. Cortes, A. Solin, E. Rahtu, and J. Kannala, “ADVIO: An authentic dataset for visual-inertial odometry,” Proceedings of the European Conference on Computer Vision (ECCV), pp. 419–434, 2018, doi: 10.1007/978-3-030-01249-6 26.

C. Theodorou, V. Velisavljevic, V. Dyo, and F. Nonyelu, “Visual SLAM algorithms and their application for AR, mapping, localization and wayfinding,” Array, vol. 15, 2022, doi: 10.1016/j.array.2022.100222.

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in Neural Information Processing Systems, vol. 3, pp. 2366–2374, 2014.

W. Chen, Z. Fu, D. Yang, and J. Deng, “Single-image depth perception in the wild,” Advances in Neural Information Processing Systems, pp. 730–738, 2016.

T. Zhou, M. Brown, N. Snavely and D. G. Lowe, “Unsupervised Learning of Depth and Ego-Motion from Video,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612-6619, doi: 10.1109/CVPR.2017.700.

C. Wang, J. M. Buenaposada, R. Zhu and S. Lucey, “Learning Depth from Monocular Videos Using Direct Methods,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2022-2030, 2018, doi: 10.1109/CVPR.2018.00216.

F. Steinbrucker, J. Sturm, and D. Cremers, “Real-time visual odometry ¨ from dense RGB-D images,” in Proceedings of the IEEE International Conference on Computer Vision, no. 3, pp. 719–722, 2011, doi: 10.1109/ICCVW.2011.6130321.

R. Garg, B. G. Vijay Kumar, G. Carneiro, and I. Reid, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” Computer Vision – ECCV 2016, vol. 9912, pp. 740–756, 2016, doi: 10.1007/978-3-319-46484-8 45.

C. Godard, O. M. Aodha and G. J. Brostow, “Unsupervised Monocular Depth Estimation with Left-Right Consistency,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602-6611, 2017, doi: 10.1109/CVPR.2017.699.

V. Casser, S. Pirk, R. Mahjourian and A. Angelova, “Unsupervised Monocular Depth and Ego-Motion Learning With Structure and Semantics,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 381-388, 2019, doi: 10.1109/CVPRW.2019.00051.

J.-W. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-m. Cheng, and I. Reid, “Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video,” arXiv, pp. 1–12, 2019.

A. Geiger, P. Lenz and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, pp. 3354-3361, 2012, doi: 10.1109/CVPR.2012.6248074.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” The International Journal of Robotics Research, vol. 23, no. 1, pp. 1–6, 2013, doi: 10.1177/0278364913491297.

M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061-3070, 2015, doi: 10.1109/CVPR.2015.7298925.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Segmentation and Support Inference from RGBD Images,” Computer Vision – ECCV 2012, pp. 746–760, 2012, doi: 10.1007/978-3-642-33715-4 54.

A. Saxena, M. Sun and A. Y. Ng, “Make3D: Learning 3D Scene Structure from a Single Still Image,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824-840, 2009, doi: 10.1109/TPAMI.2008.132.

M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213-3223, 2016, doi: 10.1109/CVPR.2016.350.

J. Sturm, N. Engelhard, F. Endres, W. Burgard and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573-580, 2012, doi: 10.1109/IROS.2012.6385773.

A. Handa, T. Whelan, J. McDonald and A. J. Davison, “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM,” 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524- 1531, 2014, doi: 10.1109/ICRA.2014.6907054.

X. Ye, X. Ji, B. Sun, S. Chen, Z. Wang, and H. Li, “DRMSLAM: Towards dense reconstruction of monocular SLAM with scene depth fusion,” Neurocomputing, vol. 396, pp. 76–91, 2020, doi: 10.1016/j.neucom.2020.02.044.

F. Mumuni, A. Mumuni, and C. K. Amuzuvi, “Deep learning of monocular depth, optical flow and ego-motion with geometric guidance for UAV navigation in dynamic environments,” Machine Learning with Applications, vol. 10, 2022, doi: 10.1016/j.mlwa.2022.100416.

C. S. Weerasekera, Y. Latif, R. Garg and I. Reid, “Dense monocular reconstruction using surface normals,” 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2524-2531, 2017, doi: 10.1109/ICRA.2017.7989293.

F. Liu, Chunhua Shen and Guosheng Lin, “Deep convolutional neural fields for depth estimation from a single image,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162-5170, 2015, doi: 10.1109/CVPR.2015.7299152.

D. Zoran, P. Isola, D. Krishnan and W. T. Freeman, “Learning Ordinal Relationships for Mid-Level Vision,” 2015 IEEE International Conference on Computer Vision (ICCV), pp. 388-396, 2015, doi: 10.1109/ICCV.2015.52.

Peng Wang, Xiaohui Shen, Zhe Lin, S. Cohen, B. Price and A. Yuille, “Towards unified depth and semantic prediction from a single image,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2800-2809, 2015, doi: 10.1109/CVPR.2015.7298897.

D. Eigen and R. Fergus, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture,” 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2650-2658, 2015, doi: 10.1109/ICCV.2015.304.

F. Liu, C. Shen, G. Lin and I. Reid, “Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 2024-2039, 2016, doi: 10.1109/TPAMI.2015.2505283.

I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari and N. Navab, “Deeper Depth Prediction with Fully Convolutional Residual Networks,” 2016 Fourth International Conference on 3D Vision (3DV), pp. 239-248, 2016, doi: 10.1109/3DV.2016.32.

F. Ma and S. Karaman, “Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image,” 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4796-4803, 2018, doi: 10.1109/ICRA.2018.8460184.

Z. Chen, V. Badrinarayanan, G. Drozdov, and A. Rabinovich, “Estimating Depth from RGB and Sparse Sensing,” Lecture Notes in Computer Science, vol. 11208, pp. 176–192, 2018, doi: 10.1007/978-3-030-01225-0_11.

Z. Yang, P. Wang, W. Xu, L. Zhao, and R. Nevatia, “Unsupervised Learning of Geometry from Videos with Edge-Aware Depth-Normal Consistency,” in The Thirty-Second AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 7493–7500, 2018, doi: 10.1609/aaai.v32i1.12257.

R. Mahjourian, M. Wicke and A. Angelova, “Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5667-5675, 2018, doi: 10.1109/CVPR.2018.00594.

Z. Yin and J. Shi, “GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1983-1992, 2018, doi: 10.1109/CVPR.2018.00212.

Y. Zou, Z. Luo, and J. B. Huang, “DF-Net: Unsupervised Joint Learning of Depth and Flow Using Cross-Task Consistency,” Lecture Notes in Computer Science, vol. 11209, pp. 38–55, 2018, doi: 10.1007/978-3-030-01228-1_3.

A. Ranjan et al., “Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12232-12241, 2019, doi: 10.1109/CVPR.2019.01252.

C. Godard, O. M. Aodha, M. Firman and G. Brostow, “Digging Into SelfSupervised Monocular Depth Estimation,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827-3837, 2019, doi: 10.1109/ICCV.2019.00393.

V. Guizilini, R. Ambrus, S. Pillai, A. Raventos and A. Gaidon, “3D Packing for Self-Supervised Monocular Depth Estimation,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2482-2491, 2020, doi: 10.1109/CVPR42600.2020.00256.

C. Luo et al., “Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2624- 2641, 2020, doi: 10.1109/TPAMI.2019.2930258.

H. Y. Lee, H. W. Ho, and Y. Zhou, “Deep Learning-based Monocular Obstacle Avoidance for Unmanned Aerial Vehicle Navigation in Tree Plantations: Faster Region-based Convolutional Neural Network Approach,” Journal of Intelligent and Robotic Systems: Theory and Applications, vol. 101, no. 5, 2021, doi: 10.1007/s10846-020-01284-z.

A. Dosovitskiy et al., “FlowNet: Learning Optical Flow with Convolutional Networks,” 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758-2766, 2015, doi: 10.1109/ICCV.2015.316.

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy and T. Brox, “FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647-1655, 2017, doi: 10.1109/CVPR.2017.179.

A. Ranjan and M. J. Black, “Optical Flow Estimation Using a Spatial Pyramid Network,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720-2729, 2017, doi: 10.1109/CVPR.2017.291.

D. Sun, X. Yang, M. -Y. Liu and J. Kautz, “PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8934-8943, 2018, doi: 10.1109/CVPR.2018.00931.

Z. Teed and J. Deng, “RAFT: Recurrent All-Pairs Field Transforms for Optical Flow,” Computer Vision – ECCV 2020, vol. 12347, pp. 4839– 4843, 2020, doi: 10.1007/978-3-030-58536-5_24.

Z. Ren, J. Yan, B. Ni, B. Liu, X. Yang, and H. Zha, “Unsupervised deep learning for optical flow estimation,” in 31st AAAI Conference on Artificial Intelligence, AAAI 2017, vol. 31, no. 1, pp. 1495–1501, 2017, doi: 10.1609/aaai.v31i1.10723.

Y. Zhu, Z. Lan, S. Newsam, and A. G. Hauptmann, “Guided Optical Flow Learning,” arXiv, 2017, doi: 10.48550/arXiv.1702.02295.

Y. Wang, Y. Yang, Z. Yang, L. Zhao, P. Wang and W. Xu, “Occlusion Aware Unsupervised Learning of Optical Flow,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4884-4893, 2018, doi: 10.1109/CVPR.2018.00513.

J. Janai, F. Guney, A. Ranjan, M. Black, and A. Geiger, “Unsupervised ¨ Learning of Multi-Frame Optical Flow with Occlusions,” Lecture Notes in Computer Science, vol. 11220, pp. 713–731, 2018, doi: 10.1007/978-3-030-01270-0_42.

Y. Zhong, P. Ji, J. Wang, Y. Dai and H. Li, “Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12087-12096, 2019, doi: 10.1109/CVPR.2019.01237.

B. Liao, J. Hu, and R. O. Gilmore, “Optical flow estimation combining with illumination adjustment and edge refinement in livestock UAV videos,” Computers and Electronics in Agriculture, vol. 180, 2021, doi: 10.1016/j.compag.2020.105910.

W. Yan, A. Sharma and R. T. Tan, “Optical Flow in Dense Foggy Scenes Using Semi-Supervised Learning,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13256-13265, 2020, doi: 10.1109/CVPR42600.2020.01327.

Q. Dai, V. Patii, S. Hecker, D. Dai, L. Van Gool and K. Schindler, “Self-supervised Object Motion and Depth Estimation from Video,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4326-4334, 2020, doi: 10.1109/CVPRW50498.2020.00510.

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A Naturalistic Open Source Movie for Optical Flow Evaluation,” in Computer Vision – ECCV 2012, vol. 7577, pp. 611–625, 2012, doi: 10.1007/978-3-642-33783-3_44.

S. Baker, S. Roth, D. Scharstein, M. J. Black, J. P. Lewis and R. Szeliski, “A Database and Evaluation Methodology for Optical Flow,” 2007 IEEE 11th International Conference on Computer Vision, pp. 1-8, 2007, doi: 10.1109/ICCV.2007.4408903.

D. Berman, T. Treibitz and S. Avidan, “Single Image Dehazing Using Haze-Lines,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 720-734, 2020, doi: 10.1109/TPAMI.2018.2882478.

C. Bailer, B. Taetz and D. Stricker, “Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation,” 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4015-4023, 2015, doi: 10.1109/ICCV.2015.457.

W. Hartmann, M. Havlena and K. Schindler, “Predicting Matchability,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 9-16, 2014, doi: 10.1109/CVPR.2014.9.

Y. Verdie, Kwang Moo Yi, P. Fua and V. Lepetit, “TILDE: A Temporally Invariant Learned DEtector,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5279-5288, 2015, doi: 10.1109/CVPR.2015.7299165.

X. Shen et al., “RF-Net: An End-To-End Image Matching Network Based on Receptive Field,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8124-8132, 2019, doi: 10.1109/CVPR.2019.00832.

N. Jacobs, N. Roman and R. Pless, “Consistent Temporal Variations in Many Outdoor Scenes,” 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-6, 2007, doi: 10.1109/CVPR.2007.383258.

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” International Journal of Computer Vision, vol. 65, pp. 43–72, 2005, doi: 10.1007/s11263-005-3848-x.

C. L. Zitnick and K. Ramnath, “Edge foci interest points,” 2011 International Conference on Computer Vision, pp. 359-366, 2011, doi: 10.1109/ICCV.2011.6126263.

V. Balntas, K. Lenc, A. Vedaldi and K. Mikolajczyk, “HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3852-3861, 2017, doi: 10.1109/CVPR.2017.410.

E. Rosten and T. Drummond, “Machine Learning for High-Speed Corner Detection,” Computer Vision – ECCV 2006, vol. 3951, pp. 430–443, 2006, doi: 10.1007/11744023_34.

W. Forstner, T. Dickscheid and F. Schindler, “Detecting interpretable ¨ and accurate scale-invariant keypoints,” 2009 IEEE 12th International Conference on Computer Vision, pp. 2256-2263, 2009, doi: 10.1109/ICCV.2009.5459458.

P. Mainali, G. Lafruit, Q. Yang, B. Geelen, L. V. Gool, and R. Lauwereins, “SIFER: Scale-invariant feature detector with error resilience,” Van-Hung Le, Visual Slam and Visual Odometry Based on RGB-D Images Using Deep Learning: A Survey Journal of Robotics and Control (JRC) ISSN: 2715-5072 1076 International Journal of Computer Vision, vol. 104, no. 2, pp. 172–197, 2013, doi: 10.1007/s11263-013-0622-3.

D. G. Low, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, 91–110, 2004, doi: 10.1023/B:VISI.0000029664.99615.94.

H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Computer Vision – ECCV 2006, vol. 3951, pp. 404–417, 2006, doi: 10.1007/11744023_32.

S. Salti, A. Lanza and L. Di Stefano, “Keypoints from symmetries by wave propagation,” 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2898-2905, 2013, doi: 10.1109/CVPR.2013.373.

C. L. Zitnick and K. Ramnath, “Edge foci interest points,” 2011 International Conference on Computer Vision, pp. 359-366, 2011, doi: 10.1109/ICCV.2011.6126263.

Y. Tian, B. Fan and F. Wu, “L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6128-6136, 2017, doi: 10.1109/CVPR.2017.649.

A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to ´ know your neighbor’s margins: Local descriptor learning loss,” Advances in neural information processing, vol. 30, pp. 1–12, 2017.

Y. Ono, P. Fua, E. Trulls, and K. M. Yi, “LF-Net: Learning local features from images,” Advances in Neural Information Processing Systems, vol. 31, pp. 6234–6244, 2018.

Z. Qin, M. Yin, G. Li, and F. Y. A, “SP-Flow: Self-supervised optical flow correspondence point prediction for real-time SLAM,” Computer Aided Geometric Design, vol. 82, 2020, doi: 10.1016/j.cagd.2020.101928.

R. Mur-Artal and J. D. Tardos, “Visual-Inertial Monocular SLAM With ´ Map Reuse,” in IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 796-803, 2017, doi: 10.1109/LRA.2017.2653359.

H. M. S. Bruno and E. L. Colombini, “LIFT-SLAM: A deep-learning feature-based monocular visual SLAM method,” Neurocomputing, vol. 455, pp. 97–110, 2021, doi: 10.1016/j.neucom.2021.05.027.

R. Mur-Artal, J. M. M. Montiel and J. D. Tardos, “ORB-SLAM: A ´ Versatile and Accurate Monocular SLAM System,” in IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, 2015, doi: 10.1109/TRO.2015.2463671.

M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” The International Journal of Robotics Research, vol. 35, no. 10, 2016, doi: 10.1177/0278364915620033.

J. Tang, L. Ericson, J. Folkesson and P. Jensfelt, “GCNv2: Efficient Correspondence Prediction for Real-Time SLAM,” in IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3505-3512, 2019, doi: 10.1109/LRA.2019.2927954.

Y. Sun, M. Liu, and M. Q. Meng, “Improving RGB-D SLAM in dynamic environments : A motion removal approach,” Robotics and Autonomous Systems, vol. 89, pp. 110–122, 2017, doi: 10.1016/j.robot.2016.11.012.

M. Kaneko, K. Iwami, T. Ogawa, T. Yamasaki and K. Aizawa, “MaskSLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 371-3718, 2018, doi: 10.1109/CVPRW.2018.00063.

L. -C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2018, doi: 10.1109/TPAMI.2017.2699184.

C. Yu et al., “DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168-1174, 2018, doi: 10.1109/IROS.2018.8593691.

B. Bescos, J. M. Facil, J. Civera and J. Neira, “DynaSLAM: Track- ´ ing, Mapping, and Inpainting in Dynamic Scenes,” in IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4076-4083, 2018, doi: 10.1109/LRA.2018.2860039.

F. Zhong, S. Wang, Z. Zhang, C. Chen and Y. Wang, “Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial,” 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001-1010, 2018, doi: 10.1109/WACV.2018.00115.

G. Tian, L. Liu, J. H. Ri, Y. Liu, and Y. Sun, “ObjectFusion: An object detection and segmentation framework with RGB-D SLAM and convolutional neural networks,” Neurocomputing, vol. 345, pp. 3–14, 2019, doi: 10.1016/j.neucom.2019.01.088.

J. Cheng, Y. Sun, and M. Q.-H. Meng, “Improving monocular visual SLAM in dynamic environments: An optical-flow-based approach,” Advanced Robotics, vol. 33, no. 12, pp. 576–589, 2019, doi: 10.1080/01691864.2019.1610060.

C. Shao, C. Zhang, Z. Fang and G. Yang, “A Deep Learning-Based Semantic Filter for RANSAC-Based Fundamental Matrix Calculation and the ORB-SLAM System,” in IEEE Access, vol. 8, pp. 3212-3223, 2020, doi: 10.1109/ACCESS.2019.2962268.

L. Xua, C. Feng, V. R. Kamata, and C. C. Menassaa, “A sceneadaptive descriptor for visual SLAM-based locating applications in built environments,” Automation in Construction, vol. 112, 2020, doi: 10.1016/j.autcon.2019.103067.

W. Liu, Y. Mo, J. Jiao and Z. Deng, “EF-Razor: An Effective EdgeFeature Processing Method in Visual SLAM,” in IEEE Access, vol. 8, pp. 140798-140805, 2020, doi: 10.1109/ACCESS.2020.3013806.

I. Rusli, B. R. Trilaksono and W. Adiprawita, “RoomSLAM: Simultaneous Localization and Mapping With Objects and Indoor Layout Structure,” in IEEE Access, vol. 8, pp. 196992-197004, 2020, doi: 10.1109/ACCESS.2020.3034537.

S. J. A, L. Chen, R. S. A, and S. McLoone, “A novel vSLAM framework with unsupervised semantic segmentation based on adversarial transfer learning,” Applied Soft Computing Journal, vol. 90, 2020, doi: 10.1016/j.asoc.2020.106153.

X. Zhao, C. Wang and M. H. Ang, “Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments,” in IEEE Access, vol. 8, pp. 155047-155059, 2020, doi: 10.1109/ACCESS.2020.3018557.

T. Qin, P. Li and S. Shen, “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator,” in IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004-1020, 2018, doi: 10.1109/TRO.2018.2853729.

J. Cheng, Z. Wang, H. Zhou, L. L. 1, and J. Yao, “DM-SLAM : A Feature-Based SLAM System for Rigid Dynamic Scenes,” International Journal of Geo-Information, vol. 9, no. 4, pp. 1–18, 2020, doi: 10.3390/ijgi9040202.

Y. Liu and J. Miura, “RDS-SLAM: Real-Time Dynamic SLAM Using Semantic Segmentation Methods,” in IEEE Access, vol. 9, pp. 23772- 23785, 2021, doi: 10.1109/ACCESS.2021.3050617.

P. Su, S. Luo and X. Huang, “Real-Time Dynamic SLAM Algorithm Based on Deep Learning,” in IEEE Access, vol. 10, pp. 87754-87766, 2022, doi: 10.1109/ACCESS.2022.3199350.

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An Open Urban Driving Simulator,” arXiv, pp. 1–16, 2017.

A. Handa, T. Whelan, J. McDonald and A. J. Davison, “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM,” 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524- 1531, 2014, doi: 10.1109/ICRA.2014.6907054.

M. Grupp, evo: Python package for the evaluation of odometry and slam, 2017, https://github.com/MichaelGrupp/evo.

Z. X. Zou, S. S. Huang, T. J. Mu, and Y. P. Wang, “ObjectFusion: Accurate object-level SLAM with neural object priors,” Graphical Models, vol. 123, 2022, doi: 10.1016/j.gmod.2022.101165.

B. Xu, W. Li, D. Tzoumanikas, M. Bloesch, A. Davison and S. Leutenegger, “MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM,” 2019 International Conference on Robotics and Automation (ICRA), pp. 5231-5237, 2019, doi: 10.1109/ICRA.2019.8794371.

Y. Zhu, R. Gao, S. Huang, S. -C. Zhu and Y. N. Wu, ”Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9954-9963, 2021, doi: 10.1109/CVPR46437.2021.00983.

C. Qiao, Z. Xiang, and X. Wang, “Objects Matter: Learning Object Relation Graph for Robust Camera Relocalization,” arXiv, 2022.

J. McCormac, A. Handa, S. Leutenegger and A. J. Davison, “SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2697-2706, 2017, doi: 10.1109/ICCV.2017.292.

J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi and A. Fitzgibbon, “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images,” 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930-2937, 2013, doi: 10.1109/CVPR.2013.377.

M. Runz, M. Buffier and L. Agapito, “MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects,” 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 10-20, 2018, doi: 10.1109/ISMAR.2018.00024.

O. Kahler, V. Adrian Prisacariu, C. Yuheng Ren, X. Sun, P. Torr and ¨ D. Murray, “Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices,” in IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 11, pp. 1241-1250, 2015, doi: 10.1109/TVCG.2015.2459891.

A. Dai, M. Nießner, M. Zollhofer, S. Izadi, and C. Theobalt, “BundleFu- ¨ sion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration,” arXiv, pp. 36–47, 2017, doi: 10.5244/C.31.158.

A. Kendall and R. Cipolla, “Geometric Loss Functions for Camera Pose Regression with Deep Learning,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6555-6564, 2017, doi: 10.1109/CVPR.2017.694.

S. Brahmbhatt, J. Gu, K. Kim, J. Hays and J. Kautz, “GeometryAware Learning of Maps for Camera Localization,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2616-2625, doi: 10.1109/CVPR.2018.00277.

C. Zhao, L. Sun, Z. Yan, G. Neumann, T. Duckett, and R. Stolkin, “Learning Kalman Network: A deep monocular visual odometry for onroad driving,” Robotics and Autonomous Systems, vol. 121, 2019, doi: 10.1016/j.robot.2019.07.004.

C. Tao, Z. Gao, J. Yan, C. Li and G. Cui, “Indoor 3D Semantic Robot VSLAM Based on Mask Regional Convolutional Neural Network,” in IEEE Access, vol. 8, pp. 52906-52916, 2020, doi: 10.1109/ACCESS.2020.2981648.

X. Han, S. Li, X. Wang, and W. Zhou, “Semantic mapping for mobile robots in indoor scenes: A survey,” Information, vol. 12, no. 2, pp. 1–14, 2021, doi: 10.3390/info12020092.

J. McCormac, A. Handa, A. Davison and S. Leutenegger, “SemanticFusion: Dense 3D semantic mapping with convolutional neural networks,” 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4628-4635, 2017, doi: 10.1109/ICRA.2017.7989538.

N. Sunderhauf, T. T. Pham, Y. Latif, M. Milford and I. Reid, “Mean- ¨ ingful maps with object-oriented semantic mapping,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079-5085, 2017, doi: 10.1109/IROS.2017.8206392.

S. Yang, Y. Huang and S. Scherer, “Semantic 3D occupancy mapping through efficient high order CRFs,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 590-597, 2017, doi: 10.1109/IROS.2017.8202212.

M. Grinvald et al., “Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery,” in IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 3037-3044, 2019, doi: 10.1109/LRA.2019.2923960.

P. Karkus, A. Angelova, V. Vanhoucke and R. Jonschkowski, “Differentiable Mapping Networks: Learning Structured Map Representations for Sparse Visual Localization,” 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4753-4759, 2020, doi: 10.1109/ICRA40945.2020.9197452.

A. Geiger, J. Ziegler and C. Stiller, “StereoScan: Dense 3d reconstruction in real-time,” 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963- 968, 2011, doi: 10.1109/IVS.2011.5940405.

T. Haarnoja, A. Ajay, S. Levine, and P. Abbeel, “Backprop KF: Learning discriminative deterministic state estimators,” Advances in Neural Information Processing Systems, vol. 29, pp. 4383–4391, 2016.

H. Coskun, F. Achilles, R. DiPietro, N. Navab and F. Tombari, “Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5525-5533, 2017, doi: 10.1109/ICCV.2017.589.

Y. Hou, H. Zhang and S. Zhou, “Convolutional neural network-based image representation for visual loop closure detection,” 2015 IEEE International Conference on Information and Automation, pp. 2238- 2245, 2015, doi: 10.1109/ICInfA.2015.7279659.

Y. Xia, J. Li, L. Qi, H. Yu and J. Dong, “An Evaluation of Deep Learning in Loop Closure Detection for Visual SLAM,” 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 85-91, 2017, doi: 10.1109/iThings-GreenCom-CPSCom-SmartData.2017.18.

X. Zhang, Y. Su and X. Zhu, “Loop closure detection for visual SLAM systems using convolutional neural network,” 2017 23rd International Conference on Automation and Computing (ICAC), pp. 1-6, 2017, doi: 10.23919/IConAC.2017.8082072.

N. Merrill and G. Huang, “Lightweight Unsupervised Deep Loop Closure,” Robotics: Science and Systems, 2018.

A. R. Memon, H. Wang, and A. Hussain, “Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems,” Robotics and Autonomous Systems, vol. 126, 2020, doi: 10.1016/j.robot.2020.103470.

J. Chang, N. Dong, D. Li, and M. Qin, “Triplet loss based metric learning for closed loop detection in VSLAM system,” Expert Systems with Applications, vol. 185, 2021, doi: 10.1016/j.eswa.2021.115646.

R. Duan, Y. Feng, and C.-Y. Wen, “Deep Pose Graph-Matching-Based Loop Closure Detection for Semantic Visual SLAM,” Sustainability, vol. 14, no. 9, 2022, doi: 10.3390/su141911864.

M. Cummins and P. Newman, “FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance,” The International Journal of Robotics Research, vol. 27, no. 6, 2008, doi: 10.1177/0278364908090961.

D. Galvez-Lopez and J. D. Tardos, “Bags of Binary Words for Fast Place ´ Recognition in Image Sequences,” in IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188-1197, 2012, doi: 10.1109/TRO.2012.2197158.

rmsalinas, DBoW3 dbow3, 2017, https://github.com/rmsalinas/DBow3.

E. Garcia-Fidalgo and A. Ortiz, “iBoW-LCD: An Appearance-Based Loop-Closure Detection Approach Using Incremental Bags of Binary Words,” in IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3051-3057, 2018, doi: 10.1109/LRA.2018.2849609.

P. -E. Sarlin, C. Cadena, R. Siegwart and M. Dymczyk, “From Coarse to Fine: Robust Hierarchical Localization at Large Scale,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12708-12717, 2019, doi: 10.1109/CVPR.2019.01300.

A. Kendall, M. Grimes and R. Cipolla, “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization,” 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2938-2946, 2015, doi: 10.1109/ICCV.2015.336.

A. Kendall and R. Cipolla, “Modelling uncertainty in deep learning for camera relocalization,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4762-4769, 2016, doi: 10.1109/ICRA.2016.7487679.

I. Melekhov, J. Ylioinas, J. Kannala and E. Rahtu, “Image-Based Localization Using Hourglass Networks,” 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 870-877, 2017, doi: 10.1109/ICCVW.2017.107.

F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck and ´ D. Cremers, “Image-Based Localization Using LSTMs for Structured Feature Correlation,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 627-637, 2017, doi: 10.1109/ICCV.2017.75.

J. Wu, L. Ma and X. Hu, “Delving deeper into convolutional neural networks for camera relocalization,” 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5644-5651, 2017, doi: 10.1109/ICRA.2017.7989663.

X. Wang, X. Wang, C. Wang, X. Bai, and J. Wu, “Discriminative Features Matter : Multi-layer Bilinear Pooling for Camera Localization,” in In: British Machine Vision Conference, pp. 1–12, 2019.

M. Bui, C. Baur, N. Navab, S. Ilic and S. Albarqouni, “Adversarial Networks for Camera Pose Regression and Refinement,” 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3778-3787, 2019, doi: 10.1109/ICCVW.2019.00470.

M. Cai, C. Shen, and I. Reid, “A hybrid probabilistic model for camera relocalization,” in British Machine Vision Conference 2018, BMVC 2018, pp. 1–12, 2019.

S. Saha, G. Varma, and C. V. Jawahar, “Improved visual relocalization by discovering anchor points,” in British Machine Vision Conference 2018, BMVC 2018, pp. 1–11, 2018.

B. Wang, C. Chen, C. X. Lu, P. Zhao, N. Trigoni, and A. Markham, “AtLoc: Attention guided camera localization,” in AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, vol. 34, no. 6, 2020, doi: 10.1609/aaai.v34i06.6608.

M. O. Turkoglu, E. Brachmann, K. Schindler, G. J. Brostow and A. Monszpart, “Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision,” 2021 International Conference on 3D Vision (3DV), pp. 145-155, 2021, doi: 10.1109/3DV53792.2021.00025.

J. Engel, V. Koltun and D. Cremers, “Direct Sparse Odometry,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611-625, 2018, doi: 10.1109/TPAMI.2017.2658577.

N. Fanani, A. Sturck, M. Ochs, H. Bradler, and R. Mester, “Predictive ¨ monocular odometry (PMO): What is possible without RANSAC and multiframe bundle adjustment?,” Image and Vision Computing, vol. 68, pp. 3–13, 2017, doi: 10.1016/j.imavis.2017.08.002.

R. Wang, M. Schworer and D. Cremers, “Stereo DSO: Large-Scale ¨ Direct Sparse Visual Odometry with Stereo Cameras,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3923-3931, 2017, doi: 10.1109/ICCV.2017.421.

F. Xue, Q. Wang, X. Wang, W. Dong, J. Wang, and H. Zha, “Guided Feature Selection for Deep Visual Odometry,” Lecture Notes in Computer Science, vol. 11366, pp. 293–308, 2019, doi: 10.1007/978-3-030-20876-9_19.

R. Kreuzig, M. Ochs and R. Mester, “DistanceNet: Estimating Traveled Distance From Monocular Images Using a Recurrent Convolutional Neural Network,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1258-1266, 2019, doi: 10.1109/CVPRW.2019.00165.

A. Saxena, S. H. Chung, and A. Y. Ng, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision, vol. 76, pp. 53–69, 2008, doi: 10.1007/s11263-007-0071-y.

A. Roy and S. Todorovic, “Monocular Depth Estimation Using Neural Regression Forest,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5506-5514, 2016, doi: 10.1109/CVPR.2016.594.

M. Mancini, G. Costante, P. Valigi and T. A. Ciarfuglia, “Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks,” 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4296-4303, 2016, doi: 10.1109/IROS.2016.7759632.

C. Cadena, A. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” Robotics: Science and Systems, vol. 12, 2016, doi: 10.15607/rss.2016.xii.041.

D. Xu, E. Ricci, W. Ouyang, X. Wang and N. Sebe, “Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 161-169, 2017, doi: 10.1109/CVPR.2017.25.

Y. Liao, L. Huang, Y. Wang, S. Kodagoda, Y. Yu and Y. Liu, “Parse geometry from a line: Monocular depth estimation with partial laser observation,” 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5059-5066, 2017, doi: 10.1109/ICRA.2017.7989590.

D. Xu, W. Wang, H. Tang, H. Liu, N. Sebe and E. Ricci, “Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917-3925, 2018, doi: 10.1109/CVPR.2018.00412.

Y. Li, K. Qian, T. Huang, and J. Zhou, “Depth estimation from monocular image and coarse depth points based on conditional GAN,” in MATEC Web of Conferences, vol. 175, pp. 1–5, 2018, doi: 10.1051/matecconf/201817503055.

A. Wang, Z. Fang, Y. Gao, X. Jiang and S. Ma, “Depth Estimation of Video Sequences With Perceptual Losses,” in IEEE Access, vol. 6, pp. 30536-30546, 2018, doi: 10.1109/ACCESS.2018.2846546.

D. Wofk, F. Ma, T. -J. Yang, S. Karaman and V. Sze, “FastDepth: Fast Monocular Depth Estimation on Embedded Systems,” 2019 International Conference on Robotics and Automation (ICRA), pp. 6101-6108, 2019, doi: 10.1109/ICRA.2019.8794182.

S. Gur and L. Wolf, “Single Image Depth Estimation Trained via Depth From Defocus Cues,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7675-7684, 2019, doi: 10.1109/CVPR.2019.00787.

d. xu, E. Ricci, W. Ouyang, X. Wang and N. Sebe, “Monocular Depth Estimation Using Multi-Scale Continuous CRFs as Sequential Deep Networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 6, pp. 1426-1440, 2019, doi: 10.1109/TPAMI.2018.2839602.

X. Tu, C. Xu, S. Liu, G. Xie and R. Li, “Real-Time Depth Estimation with an Optimized Encoder-Decoder Architecture on Embedded Devices,” 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 2141-2149, 2019, doi: 10.1109/HPCC/SmartCity/DSS.2019.00296.

T. -H. Wang, F. -E. Wang, J. -T. Lin, Y. -H. Tsai, W. -C. Chiu and M. Sun, “Plug-and-Play: Improve Depth Prediction via Sparse Data Propagation,” 2019 International Conference on Robotics and Automation (ICRA), pp. 5880-5886, 2019, doi: 10.1109/ICRA.2019.8794404.

J. Hu, Y. Zhang and T. Okatani, “Visualization of Convolutional Neural Networks for Monocular Depth Estimation,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3868-3877, 2019, doi: 10.1109/ICCV.2019.00397.

X. Tu et al., “Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model,” in IEEE Access, vol. 8, pp. 89300-89317, 2020, doi: 10.1109/ACCESS.2020.2993494.

M. Weber, C. Rist and J. M. Zollner, “Learning temporal features with ¨ CNNs for monocular visual ego motion estimation,” 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 1-6, 2017, doi: 10.1109/ITSC.2017.8317922.

S. Wang, R. Clark, H. Wen and N. Trigoni, “DeepVO: Towards endto-end visual odometry with deep Recurrent Convolutional Neural Networks,” 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043-2050, 2017, doi: 10.1109/ICRA.2017.7989236.

V. Peretroukhin, L. Clement and J. Kelly, “Reducing drift in visual odometry by inferring sun direction using a Bayesian Convolutional Neural Network,” 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2035-2042, 2017, doi: 10.1109/ICRA.2017.7989235.

R. Li, S. Wang, Z. Long and D. Gu, “UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning,” 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286-7291, 2018, doi: 10.1109/ICRA.2018.8461251.

H. Zhan, R. Garg, C. S. Weerasekera, K. Li, H. Agarwal and I. M. Reid, “Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 340-349, 2018, doi: 10.1109/CVPR.2018.00043.

E. J. Shamwell, S. Leung and W. D. Nothwang, “Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network with Online Error Correction,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2524-2531, 2018, doi: 10.1109/IROS.2018.8593573.

N. Yang, L. von Stumberg, R. Wang and D. Cremers, “D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1278-1289, 2020, doi: 10.1109/CVPR42600.2020.00136.

Y. A. M. Turan, A. E. Sarı, M. R. U. Saputra, P. P. B. de Gusmo, A. Markham, and N. Trigoni, “SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation,” Neural Networks, vol. 150, pp. 119–136, 2022, doi: 10.1016/j.neunet.2022.03.005.

S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-Based Visual-Inertial Odometry Using Nonlinear Optimization,” The International Journal of Robotics Research, vol. 34, no. 3, 2014, doi: 10.1177/0278364914554813.

M. Bloesch, M. Burri, S. Omari, M. Hutter, and R. Siegwart, “IEKFbased Visual-Inertial Odometry using Direct Photometric Feedback,” The International Journal of Robotics Research, vol. 36, no. 10, pp. 1053– 1072, 2017, doi: 10.3929/ethz-b-000187364.

Y. Xiao, L. Li, X. Li and J. Yao, “DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion,” 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10643-10650, 2022, doi: 10.1109/IROS47612.2022.9981975.

G. Zhai, L. Liu, L. Zhang, Y. Liu, and Y. Jiang, “PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning,” Pattern Recognition, vol. 102, 2020, doi: 10.1016/j.patcog.2019.107187.

R. Zhu, M. Yang, W. Liu, R. Song, B. Yan, and Z. Xiao, “DeepAVO: Efficient pose refining with feature distilling for deep visual odometry,” Neurocomputing, vol. 467, pp. 22–35, 2022, doi: 10.1016/j.neucom.2021.09.029.

F. A. Muhammet, D. Akif, Y. Abdullah, and Y. Alper, “HVIOnet: A deep learning based hybrid visual–inertial odometry approach for unmanned aerial system position estimation,” Neural Networks, vol. 155, pp. 461- 474, 2022, doi: 10.1016/j.neunet.2022.09.001.

X. Haixin, L. Yiyou, H. Zeng, Q. Li, H. Liu, B. Fan, and C. Li, “Robust self-supervised monocular visual odometry based on predictionupdate pose estimation network,” Engineering Applications of Artificial Intelligence, vol. 116, 2022, doi: 10.1016/j.engappai.2022.105481.

Y. Lu, Y. Chen, D. Zhao, and D. Li, “MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation,” Neurocomputing, vol. 421, pp. 140–150, 2021, doi: 10.1016/j.neucom.2020.07.091.

A. J. Ali, M. Kouroshli, S. Semenova, Z. S. Hashemifar, S. Y. Ko, and K. Dantu, “Edge-SLAM: Edge-Assisted Visual Simultaneous Localization and Mapping,” ACM Transactions on Embedded Computing Systems, vol. 22, no. 1, pp. 1–31, 2022, doi: 10.1145/3561972.

M. Kegeleirs, G. Grisetti, and M. Birattari, “Swarm SLAM: Challenges and Perspectives,” Frontiers in Robotics and AI, vol. 8, pp. 1–6, 2021, doi: 10.3389/frobt.2021.618268.

P. -Y. Lajoie, B. Ramtoula, Y. Chang, L. Carlone and G. Beltrame, “DOOR-SLAM: Distributed, Online, and Outlier Resilient SLAM for Robotic Teams,” in IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1656-1663, 2020, doi: 10.1109/LRA.2020.2967681.

M. N. Osman Zahid and L. J. Hao, “A Study on Obstacle Detection For IoT Based Automated Guided Vehicle (AGV),” Mekatronika, vol. 4, no. 1, pp. 30–41, 2022, doi: 10.15282/mekatronika.v4i1.7534.

S. Buck, R. Hanten, K. Bohlmann and A. Zell, “Generic 3D obstacle detection for AGVs using time-of-flight cameras,” 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4119-4124, 2016, doi: 10.1109/IROS.2016.7759606.




DOI: https://doi.org/10.18196/jrc.v5i4.22061

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Van-Hung Le

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 


Journal of Robotics and Control (JRC)

P-ISSN: 2715-5056 || E-ISSN: 2715-5072
Organized by Peneliti Teknologi Teknik Indonesia
Published by Universitas Muhammadiyah Yogyakarta in collaboration with Peneliti Teknologi Teknik Indonesia, Indonesia and the Department of Electrical Engineering
Website: http://journal.umy.ac.id/index.php/jrc
Email: jrcofumy@gmail.com


Kuliah Teknik Elektro Terbaik