Cover Image

Real-Time Human Detection Using Deep Learning on Embedded Platforms: A Review

Wahyu Rahmaniar, Ari Hernawan


The detection of an object such as a human is very important for image understanding in the field of computer vision. Human detection in images can provide essential information for a wide variety of applications in intelligent systems. In this paper, human detection is carried out using deep learning that has developed rapidly and achieved extraordinary success in various object detection implementations. Recently, several embedded systems have emerged as powerful computing boards to provide high processing capabilities using the graphics processing unit (GPU). This paper aims to provide a comprehensive survey of the latest achievements in this field brought about by deep learning techniques in the embedded platforms. NVIDIA Jetson was chosen as a low power system designed to accelerate deep learning applications. This review highlights the performance of human detection models such as PedNet, multiped, SSD MobileNet V1, SSD MobileNet V2, and SSD inception V2 on edge computing. This survey aims to provide an overview of these methods and compare their performance in accuracy and computation time for real-time applications. The experimental results show that the SSD MobileNet V2 model provides the highest accuracy with the fastest computation time compared to other models in our video datasets with several scenarios.


computer vision; convolutional neural network; deep learning; human detection; NVIDIA Jetson; object detection

Full Text:



W. Rahmaniar, W. Wang, and H. Chen, “Real-time detection and recognition of multiple moving objects for aerial surveillance,” Electronics, vol. 8, no. 12, pp. 1373–1390, 2019.

W. Rahmaniar and A. E. Rakhmania, “Online digital image stabilization for an unmanned aerial vehicle (UAV),” Journal of Robotics and Control, vol. 2, no. 4, pp. 234–239, 2021.

E. P. Ijjina, D. Chand, S. Gupta, and G. K, “Computer vision-based accident detection in traffic surveillance,” in Proc. of Int. Conf. Comput. Commun. Netw. Technol., 2019, arXiv:1911.10037.

W. Rahmaniar and W. Wang, “Real-time automated segmentation and classification of calcaneal fractures in CT images,” Appl. Sci., vol. 9, no. 15, pp. 3011–3028, 2019.

R. Shanmugamani, M. Sadique, and B. Ramamoorthy, “Detection and classification of surface defects of gun barrels using computer vision and machine learning,” Meas. J. Int. Meas. Confed., vol. 60, pp. 222–230, Jan. 2015.

M. V. Rajasekhar and A. K. Jaswal, “Autonomous vehicles: The future of automobiles,” in Proc. of IEEE International Transportation Electrification Conf., 2016, pp. 1-6.

J. García, A. Gardel, I. Bravo, J. L. Lázaro, and M. Martínez, “Tracking people motion based on extended condensation algorithm,” IEEE Trans. Syst. Man, Cybern. Part ASystems Humans, vol. 43, no. 3, pp. 606–618, 2013.

A. B. Chan and N. Vasconcelos, “Counting people with low-level features and bayesian regression,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 2160–2177, Apr. 2012.

E. Zhang, B. Xue, F. Cao, J. Duan, G. Lin, and Y. Lei, “Fusion of 2D CNN and 3D DenseNet for dynamic gesture recognition,” Electronics, vol. 8, no. 12, p. 1511, Dec. 2019.

M. Ling and X. Geng, “Indoor crowd counting by mixture of gaussians label distribution learning,” IEEE Trans. Image Process., vol. 28, no. 11, pp. 5691–5701, 2019.

H. H. Ali, H. M. Moftah, and A. A. A. Youssif, “Depth-based human activity recognition: A comparative perspective study on feature extraction,” Futur. Comput. Informatics J., vol. 3, no. 1, pp. 51–67, 2018.

T. Nguyen, E. A. Park, J. Han, D. C. Park, and S. Y. Min, “Object detection using scale invariant feature transform,” Advances in Intelligent Systems and Computing, vol. 238, pp. 65–72, 2014.

H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-Up Robust Features (SURF),” Comput. Vis. Image Underst., vol. 110, no. 3, pp. 346–359, Jun. 2008.

W. Rahmaniar and W.-J. Wang, “A novel object detection method based on Fuzzy sets theory and SURF,” in Proc. of International Conference on System Science and Engineering, 2015, pp. 570–584.

S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, 2002.

B. E. Demiröz, A. A. Salah, Y. Bastanlar, and L. Akarun, “Affordable person detection in omnidirectional cameras using radial integral channel features,” Mach. Vis. Appl., vol. 30, no. 4, pp. 645–655, Mar. 2019.

N. Dalal and W. Triggs, “Histograms of oriented gradients for human detection,” in Proc. of IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2005, pp. 886–893.

Y. Pang, Y. Yuan, X. Li, and J. Pan, “Efficient HOG human detection,” Signal Processing, vol. 91, no. 4, pp. 773–781, Apr. 2011.

P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, May 2004.

P. Viola, M. J. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” in Proc. of IEEE International Conference on Computer Vision, 2003, pp. 734–741.

L. Nanni, A. Lumini, and S. Brahnam, “Survey on LBP based texture descriptors for image classification,” Expert Syst. Appl., vol. 39, no. 3, pp. 3634–3641, 2012.

P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. of IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition (CVPR), 2001, pp. I-511-I–518.

C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, 1998.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. of IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2014, pp. 580–587.

Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.

R. Girshick, “Fast R-CNN,” in Proc. of IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.

C. B. Murthy, M. F. Hashmi, N. D. Bokde, and Z. W. Geem, “Investigations of object detection in images/videos using various deep learning techniques and embedded platforms - a comprehensive review,” Appl. Sci., vol. 10, no. , pp. 3280-3326, 2020.

X. Feng, Y. Jiang, X. Yang, M. Du, and X. Li, “Computer vision algorithms and hardware implementations: A survey,” Integr. VLSI J., vol. 69, no. June, pp. 309–320, 2019.

C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless networking: a survey,” IEEE Commun. Surv. Tutorials, vol. 21, no. 3, pp. 2224–2287, 2019.

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue, vol. 6, no. 2, pp. 40–53, Mar. 2008.

S. Chetlur et al., “cuDNN: efficient primitives for deep learning,” Oct. 2014, arXiv:1410.0759.

K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biol. Cybern., vol. 36, no. 4, pp. 193–202, Apr. 1980.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proc. of IEEE, vol. 86, no. 11, 1998, pp. 2278–2323.

A. Krizhevsky, I. Sutskever, and H. Geoffrey E., “ImageNet classification with deep convolutional neural networks,” Adv. Neural Inf. Process. Syst. 25, pp. 1–9, 2012.

“NVIDIA Deep Learning TensorRT Documentation.”

M. Ullah and A. Mohammed, “PedNet : A spatio-temporal deep convolutional neural network for pedestrian segmentation,” J. Imaging, vol. 4, no. 9, pp. 1–18, 2018.

A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861, 2017.

W. Liu et al., “SSD: Single Shot MultiBox Detector,” arXiv:1512.02325v5, 2016.

C. Szegedy et al., “Going deeper with convolutions,” in Proc of IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1–9.

C. Ning, H. Zhou, Y. Song, and J. Tang., “Inception single shot multibox detector for object detection, ” in Proc. of IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2017, pp. 549-554.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. of IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2818–2826.

L. Barba-guaman and A. Ortiz, “Deep learning framework for vehicle and pedestrian detection in rural roads on an embedded GPU,” Electronics, vol. 9, no, 4, pp. 1–17, 2020.


Article Metrics

Abstract view : 318 times
PDF - 215 times


  • There are currently no refbacks.

Copyright (c) 2021 Journal of Robotics and Control (JRC)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Abstracted/Indexed by:




Journal of Robotics and Controls (JRC)

P-ISSN: 2715-5056 || E-ISSN: 2715-5072
Organized by Lembaga Penelitian, Publikasi & Pengabdian Masyarakat UMY, Yogyakarta, Indonesia
Published by Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia
Email: ||


Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.