Leveraging a Two-Level Attention Mechanism for Deep Face Recognition with Siamese One-Shot Learning

Arkan Mahmood Albayati; Wael Chtourou; Faouzi Zarai

doi:10.18196/jrc.v5i1.20135

Authors

Arkan Mahmood Albayati University of Sfax
Wael Chtourou University of Sfax
Faouzi Zarai University of Sfax

DOI:

https://doi.org/10.18196/jrc.v5i1.20135

Keywords:

One-shot, Siamese Network, Triplet Loss, Contrastive Loss, ‎Attention.

Abstract

Discriminative feature embedding is used for largescale facial recognition. Many image-based facial recognition networks use CNNs like ResNets and VGG-nets. Humans prioritise different elements, but CNNs treat all facial pictures equally. NLP and computer vision use attention to learn the most important part of an input signal. The inter-channel and inter-spatial attention mechanism is used to assess face image component significance in this study. Channel scalars are calculated using Global Average Pooling in face recognition channel attention. A recent study found that GAP encodes low-frequency channel information first. We compressed channels using discrete cosine transform (DCT) instead of scalar representation to evaluate information at frequencies other than the lowest frequency for the channel attention mechanism. Later layers can acquire the feature map after spatial attention. Channel and spatial attention increase CNN facial recognition feature extraction. Channel-only, spatial-only, parallel, sequential, or channel-after-spatial attention blocks exist. Current face recognition attention approaches may be outperformed on public datasets (Labelled Faces in the Wild).

References

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.

J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018, doi: 10.1109/CVPR.2018.00745.

M. A. Al-Shareeda, A. A. Alsadhan, H. H. Qasim, and S. Manickam, “Software Defined Networking for Internet of Things: Review, Techniques, Challenges, and Future Directions,” Bulletin of Electrical Engineering and Informatics, vol. 13, no. 1, pp. 638–647, 2024, doi: 10.11591/eei.v13i1.6386.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification With Deep Convolutional Neural Networks,” Advances in neural information processing systems, vol. 25, 2012, doi: 10.1145/3065386.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper With Convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015, doi: 10.1109/CVPR.2015.7298594.

H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large Margin Cosine Loss for Deep Face Recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5265–5274, 2018, doi: 10.48550/arXiv.1801.09414.

A. A. Almazroi, E. A. Aldhahri, M. A. Al-Shareeda, and S. Manickam, “Eca-Vfog: an Efficient Certificateless Authentication Scheme for 5gAssisted Vehicular Fog Computing,” Plos one, vol. 18, no. 6, pp. 1–20, 2023, doi: 10.1371/journal.pone.0287291.

W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep Hypersphere Embedding for Face Recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 212– 220, 2017, doi: 10.48550/arXiv.1704.08063.

M. A. Al-Shareeda and S. Manickam, “A Systematic Literature Review on Security of Vehicular Ad-Hoc Network (Vanet) Based on Veins Framework,” IEEE Access, vol. 11, pp. 46218-46228, 2023, doi: 10.1109/ACCESS.2023.3274774.

J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive Angular Margin Loss for Deep Face Recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4685-4694, 2019, doi: 10.1109/CVPR.2019.00482.

F. Wang, J. Cheng, W. Liu, and H. Liu, “Additive Margin Softmax for Face Verification,” IEEE Signal Processing Letters, vol. 25, no. 7, pp. 926–930, 2018, doi: 10.1109/LSP.2018.2822810.

Z. G. Al-Mekhlafi, M. A. Al-Shareeda, S. Manickam, B. A. Mohammed, A. Alreshidi, M. Alazmi, J. S. Alshudukhi, M. Alsaffar, and T. H. Rassem, “Efficient Authentication Scheme for 5g-Enabled Vehicular Networks Using Fog Computing,” Sensors, vol. 23, no. 7, p. 3543, 2023, doi: 10.3390/s23073543.

G. B. Huang and E. Learned-Miller, “Labeled Faces in the Wild: Updates and New Reporting Procedures,” Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Tech. Rep, vol. 14, no. 3, 2014.

I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard, “The Megaface Benchmark: 1 Million Faces for Recognition at Scale,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4873–4882, 2016.

A. A. Almazroi, M. A. Alqarni, M. A. Al-Shareeda, and S. Manickam, “L-Cppa: Lattice-Based Conditional Privacy-Preserving Authentication Scheme for Fog Computing With 5g-Enabled Vehicular System,” Plos one, vol. 18, no. 10, pp. 1–23, 2023, doi: 10.1371/journal.pone.0292690.

G. Salomon, A. Britto, R. H. Vareto, W. R. Schwartz, and D. Menotti, “Open-Set Face Recognition for Small Galleries Using Siamese Networks,” in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 161–166, 2020, doi: 10.1109/IWSSIP48289.2020.9145245.

M. A. Al-Shareeda, A. A. Alsadhan, H. H. Qasim, and S. Manickam, “Long Range Technology for Internet of Things: Review, Challenges, and Future Directions,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 6, pp. 3758–3767, 2023, doi: 10.11591/eei.v12i6.5214.

Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep Learning Face Representation by Joint Identification-Verification,” Advances in neural information processing systems, vol. 27, 2014, doi: 10.48550/arXiv.1406.4773.

Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A Discriminative Feature Learning Approach for Deep Face Recognition,” in European conference on computer vision, vol. 9911, pp. 499–515, 2016, doi: 10.1007/978-3- 319-46478-7 31.

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A Unified Embedding for Face Recognition and Clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015.

X. Wang, S. Wang, S. Zhang, T. Fu, H. Shi, and T. Mei, “Support Vector Guided Softmax Loss for Face Recognition,” Computer Vision and Pattern Recognition, 2018, doi: 10.48550/arXiv.1812.11317.

S. Lei, W. Yi, C. Ying, and W. Ruibin, “Review of Attention Mechanism in Natural Language Processing,” Data Analysis and Knowledge Discovery, vol. 4, no. 5, pp. 1–14, 2020, doi: 10.11925/infotech.2096- 3467.2019.1317.

M. A. Al-Shareeda, S. Manickam, and M. Ali, “Ddos Attacks Detection Using Machine Learning and Deep Learning Techniques: Analysis and Comparison,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 2, pp. 930–939, 2023.

A. Galassi, M. Lippi, and P. Torroni, “Attention in Natural Language Processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 10, pp. 4291–4308, 2020, doi: 10.1109/TNNLS.2020.3019893.

S. U. A. Laghari, S. Manickam, A. K. Al-Ani, M. A. Al-Shareeda, and S. Karuppayah, “ES-SECS/GEM: An Efficient Security Mechanism for SECS/GEM Communications,” IEEE Access, vol. 11, pp. 31813-31828, 2023, doi: 10.1109/ACCESS.2023.3262310.

C. Yu, Z. Zhang, H. Li, J. Sun, and Z. Xu, “Meta-LearningBased Adversarial Training for Deep 3d Face Recognition on Point Clouds,” Pattern Recognition, vol. 134, p. 109065, 2023, doi: 10.1016/j.patcog.2022.109065.

F. Liu, D. Chen, F. Wang, Z. Li, and F. Xu, “Deep Learning Based Single Sample Face Recognition: A Survey,” Artificial Intelligence Review, vol. 56, no. 3, pp. 2723–2748, 2023, doi: 10.1007/s10462-022-10240-2.

F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual Attention Network for Image Classification,” in Computer Vision and Pattern Recognition, pp. 3156–3164, 2017, doi: 10.48550/arXiv.1704.06904.

B. A. Mohammed, M. A. Al-Shareeda, S. Manickam, Z. G. Al-Mekhlafi, A. M. Alayba, and A. A. Sallam, “Anaa-Fog: A Novel Anonymous Authentication Scheme for 5g-Enabled Vehicular Fog Computing,” Mathematics, vol. 11, no. 6, p. 1446, 2023, doi: 10.3390/math11061446.

Q. Zhao, J. Liu, Y. Li, and H. Zhang, “Semantic Segmentation With Attention Mechanism for Remote Sensing Images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2021, doi: 10.1109/TGRS.2021.3085889.

B. A. Mohammed, M. A. Al-Shareeda, S. Manickam, Z. G. Al-Mekhlafi, A. Alreshidi, M. Alazmi, J. S. Alshudukhi, and M. Alsaffar, “Fc-Pa: Fog Computing-Based Pseudonym Authentication Scheme in 5g-Enabled Vehicular Networks,” IEEE Access, vol. 11, pp. 18 571–18 581, 2023, doi: 10.1109/ACCESS.2023.3247222.

M. Jian, K. -M. Lam, J. Dong and L. Shen, “Visual-Patch-AttentionAware Saliency Detection,” in IEEE Transactions on Cybernetics, vol. 45, no. 8, pp. 1575-1586, 2015, doi: 10.1109/TCYB.2014.2356200.

M. A. Al-Shareeda, S. Manickam, B. A. Mohammed, Z. G. AlMekhlafi, A. Qtaish, A. J. Alzahrani, G. Alshammari, A. A. Sallam, and K. Almekhlafi, “Provably Secure With Efficient Data Sharing Scheme for Fifth-Generation (5g)-Enabled Vehicular Networks Without RoadSide Unit (RSU),” Sustainability, vol. 14, no. 16, p. 9961, 2022, doi: 10.3390/su14169961.

H. Ling, J. Wu, J. Huang, J. Chen, and P. Li, “Attention-Based Convolutional Neural Network for Deep Face Recognition,” Multimedia Tools and Applications, vol. 79, no. 9, pp. 5595–5616, 2020, doi: 10.1007/s11042- 019-08422-2.

Y. Rao, J. Lu, and J. Zhou, “Attention-Aware Deep Reinforcement Learning for Video Face Recognition,” in Proceedings of the IEEE international conference on computer vision, pp. 3951-3960, 2017, doi: 10.1109/ICCV.2017.424.

M. Sajjad, F. U. M. Ullah, M. Ullah, G. Christodoulou, F. A. Cheikh, M. Hijji, K. Muhammad, and J. J. Rodrigues, “A Comprehensive Survey on Deep Facial Expression Recognition: Challenges, Applications, and Future Guidelines,” Alexandria Engineering Journal, vol. 68, pp. 817– 840, 2023, doi: 10.1016/j.aej.2023.01.017.

M. A. Al-Shareeda, S. Manickam, M. A. Saare, and N. B. Omar, “Sadetection: Security Mechanisms to Detect Slaac Attack in Ipv6 Link-Local Network,” Informatica, vol. 46, no. 9, 2023, doi: 10.31449/inf.v46i9.4441.

T. Ahonen, A. Hadid and M. Pietikainen, “Face Description with Local Binary Patterns: Application to Face Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037- 2041, 2006, doi: 10.1109/TPAMI.2006.244.

M. Jabberi, A. Wali, B. B. Chaudhuri, and A. M. Alimi, “68 landmarks are efficient for 3d face alignment: what about more? 3d face alignment method applied to face recognition,” Multimedia Tools and Applications, vol. 82 pp. 41435–41469, 2023, doi: 10.1007/s11042-023-14770-x.

K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Computer Vision and Pattern Recognition, pp. 1–14, 2014, doi: 10.48550/arXiv.1409.1556.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” Computer Vision and Pattern Recognition, 2016, doi: 10.48550/arXiv.1602.07360.

G. Hu et al., “When Face Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face Recognition,” 2015 IEEE International Conference on Computer Vision Workshop, pp. 384-392, 2015, doi: 10.1109/ICCVW.2015.58.

G. Guo and N. Zhang, “A survey on deep learning based face recognition,” Computer vision and image understanding, vol. 189, p. 102805, 2019, doi: 10.1016/j.cviu.2019.102805.

Y. Taigman, M. Yang, M. Ranzato and L. Wolf, “DeepFace: Closing the Gap to Human-Level Performance in Face Verification,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701-1708, 2014, doi: 10.1109/CVPR.2014.220.

Y. Guo and L. Zhang, “One-shot Face Recognition by Promoting Underrepresented Classes,” Computer Vision and Pattern Recognition, 2017, doi: 10.48550/arXiv.1707.05574.

L. Wang, Y. Li and S. Wang, “Feature Learning for One-Shot Face Recognition,” 2018 25th IEEE International Conference on Image Processing, pp. 2386-2390, 2018, doi: 10.1109/ICIP.2018.8451464.

Z. Ding, Y. Guo, L. Zhang and Y. Fu, “One-Shot Face Recognition via Generative Learning,” 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1-7, 2018, doi: 10.1109/FG.2018.00011.

A. Jadhav, V.P. Namboodiri and K.S. Venkatesh, “Deep Attributes for One-Shot Face Recog- nition”, ECCV Workshop on ‘Transfering and Adapting Source Knowledge in Computer Vision’, 2016.

Y. Wu, H. Liu, and Y. Fu, “Low-shot face recognition with hybrid classifiers,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1933–1939, 2017.

S. Hong, W. Im, J. Ryu and H. S. Yang, “SSPP-DAN: Deep domain adaptation network for face recognition with single sample per person,” 2017 IEEE International Conference on Image Processing, pp. 825-829, 2017, doi: 10.1109/ICIP.2017.8296396.

Y. Cheng et al., “Know You at One Glance: A Compact Vector Representation for Low-Shot Learning,” 2017 IEEE International Conference on Computer Vision Workshops, pp. 1924-1932, 2017, doi: 10.1109/ICCVW.2017.227.

Q. Cao, L. Shen, W. Xie, O. M. Parkhi and A. Zisserman, “VGGFace2: A Dataset for Recognising Faces across Pose and Age,” 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 67-74, 2018, doi: 10.1109/FG.2018.00020.

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146–3154, 2019.

J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah, “Signature ¨ Verification Using A” Siamese” Time Delay Neural Network,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, no. 4, pp. 669–688, 1993, doi: 10.1142/S0218001493000339.

G. Koch, R. Zemel, R. Salakhutdinov, et al., “Siamese neural networks for one-shot image recognition,” in ICML deep learning workshop, vol. 2, no. 1, 2015.

L. Song, D. Gong, Z. Li, C. Liu, and W. Liu, “Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 773–782, 2019, doi: 10.1109/ICCV.2019.00086.

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional Block Attention Module,” in Proceedings of the European Conference on Computer Vision, pp. 3–19, 2018, doi: 10.1007/978-3-030-01234-2 1.

X. Wang, R. Girshick, A. Gupta and K. He, “Non-local Neural Networks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, 2018, doi: 10.1109/CVPR.2018.00813.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Computation and Language, 2017, doi: 10.48550/arXiv.1706.03762.

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-Attention Generative Adversarial Networks,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 7354–7363, 2019.

N. Ahmed, T. Natarajan and K. R. Rao, “Discrete Cosine Transform,” in IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, 1974, doi: 10.1109/T-C.1974.223784.

T. Elsken, J. H. Metzen, and F. Hutter, “Neural Architecture Search: A Survey,” Machine Learning, vol. 20, pp. 1–21, 2019, doi: 10.48550/arXiv.1808.05377.

S. Karagiannakos, “Neural architecture search (nas): basic principles and different approaches,” https://theaisummer.com/, 2021.

M. Wistuba, A. Rawat, and T. Pedapati, “A Survey on Neural Architecture Search,” Machine Learning, 2019, doi: 10.48550/arXiv.1905.01392.

H. Lee, H. -E. Kim and H. Nam, “SRM: A Style-Based Recalibration Module for Convolutional Neural Networks,” 2019 IEEE/CVF International Conference on Computer Vision, pp. 1854-1862, 2019, doi: 10.1109/ICCV.2019.00194.

Z. Qin, P. Zhang, F. Wu and X. Li, “FcaNet: Frequency Channel Attention Networks,” 2021 IEEE/CVF International Conference on Computer Vision, pp. 763-772, 2021, doi: 10.1109/ICCV48922.2021.00082.

S. Zagoruyko and N. Komodakis, “Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer,” Computer Vision and Pattern Recognition, 2016, doi: 10.48550/arXiv.1612.03928.

G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled Faces in the Wild: A Database Forstudying Face Recognition in Unconstrained Environments,” in Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.

G. Huang, M. Mattar, H. Lee, and E. Learned-Miller, “Learning to Align From Scratch,” Advances in neural information processing systems, vol. 25, 2012.

S. Chanda, A. C. GV, A. Brun, A. Hast, U. Pal and D. Doermann, “Face Recognition - A One-Shot Learning Perspective,” 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems, pp. 113-119, 2019, doi: 10.1109/SITIS.2019.00029.