Enhanced Xception Model for Deepfake Detection: Integrating CBAM, Contrastive Learning, and a Stacking Classifier

B N Jyothi; M A Jabbar

doi:10.18196/jrc.v6i4.25811

Authors

B N Jyothi Vardhaman College of Engineering
M A Jabbar Vardhaman College of Engineering

DOI:

https://doi.org/10.18196/jrc.v6i4.25811

Keywords:

Deepfake Detection, Convolutional Block Attention Module, Contrastive Learning, Ensemble Learning, Domain Adaptation, Cross Data Set Generalization

Abstract

Deepfake detection has become increasingly vital in the era of sophisticated fake media generation techniques. Threats posed by these deep fakes make deep fake detection inevitable. Research on Deep fake detection has been conducted extensively. But problems like resource intensive models, generalizability across datasets are still existing. To overcome the above problems, we propose a framework which embraces the transfer learning and lightweight architecture of xception model. The framework consists of three major inherent steps for deep-fake detection. The first step involves a feature extractor that uses the pretrained Xception as the backbone. The feature extractor has two branches for global and local feature extraction. The global feature branch uses the pre-trained Xception for feature extraction, while the local feature branch uses the xception model enhanced through Convolutional Block Attention Module (CBAM) enhanced to effectively extract deepfake-specific features and contrastive learning to equip Xception with discriminative power for feature extraction. Once the local and global features are extracted, two separate Random Forest classifiers are trained on these features. Finally, the predicted probabilities from these two models are ensembled using a logistic regression meta-model. To avoid the effects of class imbalance on the model performance, care was taken to balance samples in each category through augmentations. The model is trained on Face Forensics++ dataset and evaluated for cross datasets on Celeb-Df and UADFV datasets. Given that generalization across datasets is a major challenge faced by deepfake detection models, we integrate domain adaptation where our model performs noticeably well minimal fine-tuning using 10 % data. The proposed framework showed significant improvements with a 5% increase in accuracy, a 1% increase in ROC, and a 2% increase in precision compared to state-of-the-art (SOTA) models.

References

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, pp. 123- 130, 2014, doi: 10.48550/arXiv.1406.2661.

L. Carvajal and A. Iliadis, “Deepfakes: A Preliminary Systematic Review of the Literature,” AoIR Selected Papers of Internet Research, 2020, doi: 10.5210/spir.v2020i0.11190.

R. Chauhan, R. Popli and I. Kansal, “A Systematic Review on Fake Image Creation Techniques,” 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 779-783, 2023.

L. Whittaker, R. F. Mulcahy, K. Letheren, J. H. Kietzmann, and R. Russell–Bennett, “Mapping the deepfake landscape for innovation: A multidisciplinary systematic review and future research agenda,” Technovation, vol. 125, 2023, doi: 10.1016/j.technovation.2023.102784.

Reface App, “Reface App,” [Online]. Available: https://reface.app/. Accessed: Sep. 11, 2020.

ZAO, “ZAO,” https://apps.apple.com/cn/app/zao/id1465199127, Accessed: 09-Sep-2020.

FaceApp, “FaceApp,” [Online]. Available: https://www.faceapp.com/. Accessed: Sep. 17, 2020.

Sound Forge, “Sound Forge,” [Online]. Available: https://www.magix. com/gb/music/sound-forge/. Accessed: Jan. 11, 2021.

Audacity, “Audacity,” [Online]. Available: https://www.audacityteam.org/. Accessed: Sep. 9, 2020.

R. Gil, J. Virgili-Goma, J.-M. Lopez-Gil, and R. Garcıa, “Deepfakes: Evolution and Trends,” Soft Computing, vol. 27, pp. 11295–11318, 2023, doi: 10.1007/s00500-023-08605-y.

A. Kaur, A. Noori Hoshyar, V. Saikrishna, S. Firmin, and F. Xia, “Deepfake Video Detection: Challenges and Opportunities,” Artificial Intelligence Review, vol. 57, no. 159, 2024, doi: 10.1007/s10462-024-10810-6.

P. Rana and S. Bansal, “Exploring Deepfake Detection: Techniques, Datasets and Challenges,” International Journal of Computing and Digital Systems, vol. 16, 2024, doi: 10.12785/ijcds/160156.

F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1251–1258, 2017, doi: 10.48550/arXiv.1610.02357.

Tin Kam Ho, “The random subspace method for constructing decision forests,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, 1998, doi: 10.1109/34.709601.

P. K. Srivastava, G. Singh, S. Kumar, N. K. Jain, and V. Bali, “Gabor Filter and Centre Symmetric-Local Binary Pattern Based Technique for Forgery Detection in Images,” Multimedia Tools and Applications, vol. 83, no. 17, pp. 50157–50195, 2024, doi: 10.1007/s11042-023-17485-1.

C. Tan, Y. Zhao, S. Wei, G. Gu, P. Liu, and Y. Wei, “FrequencyAware Deepfake Detection: Improving Generalizability Through Frequency Space Domain Learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, pp. 5052–5060, 2024, doi: 10.1609/aaai.v38i5.28310.

S. Fang, Z. Zhang, and B. Song, “Deepfake Detection Model Combining Texture Differences and Frequency Domain Information,” ACM Transactions on Privacy and Security, no. 21, pp. 1-16, 2024, doi: 10.1145/3697336.

X. Jin, N. Wu, Q. Jiang, Y. Kou, H. Duan, P. Wang, and S. Yao, “A Dual Descriptor Combined with Frequency Domain Reconstruction Learning for Face Forgery Detection in Deepfake Videos,” Forensic Science International: Digital Investigation, vol. 49, 2024, doi: 10.1016/j.fsidi.2024.301747.

R. Tolosana, S. Romero-Tapiador, J. Fierrez, and R. Vera-Rodriguez, “XceptionNet: Advanced Deepfake Detection Through Multi-Attention Mechanisms,” IEEE Transactions on Information Forensics and Security, vol. 40, pp. 1-12, 2025, doi: 10.1109/TIFS.2025.3989651.

R. N. Bharath Reddy, T. V. Naga Siva, B. S. Ram, K. N. Ramya Sree and B. Suvarna, “Enhanced Deep Fake Image Detection via Feature Fusion of EfficientNet, Xception, and ResNet Models,” 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), pp. 1547-1552, 2025, doi: 10.1109/ICMCSI64620.2025.10883356.

Z. Xia, T. Qiao, and M. Xu, “Deepfake Video Detection Based on MesoNet with Preprocessing Module,” Symmetry, vol. 14, no. 5, pp. 939- 952, 2024, doi: 10.3390/sym14050939.

S. S. Khalil, S. M. Youssef, and S. N. Saleh, “Capsule-Forensics: An Integrated Approach for Deepfake Detection Using Dynamic Routing Between Capsules,” Future Internet, vol. 13, no. 4, pp. 93-108, 2024.

H. Ilyas, A. Javed, and K. M. Malik, “Two-Stream Neural Network for Deepfake Detection: Combining Spatial and Temporal Features,” in 2024 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2024.

S. Kingra, N. Aggarwal, and N. Kaur, “SFormer: An End-to-End SpatioTemporal Transformer Architecture for Deepfake Detection,” Forensic Science International: Digital Investigation, vol. 44, 2024, doi: 10.1016/j.fsidi.2024.301584.

M. Zou, B. Yu, Y. Zhan, S. Lyu and K. Ma, “Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach,” in IEEE Transactions on Circuits and Systems for Video Technology, doi: 10.1109/TCSVT.2025.3572508.

A. M, K. S. Charan, S. BN and S. Kanmani R, “Deep Fake Detection using Transfer Learning: A Comparative study of Multiple Neural Networks,” 2024 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT), pp. 1-6, 2024, doi: 10.1109/IConSCEPT61884.2024.10627869.

G. Naskar, S. Mohiuddin, S. Malakar, E. Cuevas, and R. Sarkar, “Deepfake Detection Using Deep Feature Stacking and Meta-Learning,” Heliyon, vol. 10, no. 4, 2024.

N. M. Alnaim et al., “DFFMD: A Deepfake Face Mask Dataset for Infectious Disease Era With Deepfake Detection Algorithms,” IEEE Access, vol. 11, pp. 16711–16722, 2023, doi: 10.1109/ACCESS.2023.3246661.

A. Ciamarra, R. Caldelli, F. Becattini, L. Seidenari, and A. Del Bimbo, “Deepfake Detection by Exploiting Surface Anomalies: The SurFake Approach,” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1024–1033, 2024.

G. H. Ishrak, Z. Mahmud, M. Farabe, T. K. Tinni, T. Reza, and M. Z. Parvez, “Explainable Deepfake Video Detection Using Convolutional Neural Network and CapsuleNet,” arXiv, 2024, doi: 10.48550/arXiv.2404.12841.

A. S. A. Al-Qazzaz, P. Salehpour, and H. S. Aghdasi, “Robust DeepFake Face Detection Leveraging Xception Model and Novel Snake Optimization Technique,” Journal of Robotic and Control (JRC), vol. 5, no. 5, pp. 1444–1456, 2024, doi: 10.18196/jrc.v5i5.22473.

T. Qiao, S. Xie, Y. Chen, F. Retraint and X. Luo, “Fully Unsupervised Deepfake Video Detection Via Enhanced Contrastive Learning,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 7, pp. 4654-4668, 2024, doi: 10.1109/TPAMI.2024.3356814.

J. Zheng, Y. Zhou, X. Hu, and Z. Tang, “Deepfake detection with combined unsupervised-supervised contrastive learning,” in Proc. 2024 IEEE Int. Conf. Image Process. (ICIP), pp. 787–793, 2024, doi: 10.1109/ICIP51287.2024.10647603.

C.-Y. Hong, Y.-C. Hsu, and T.-L. Liu, “Contrastive learning for DeepFake classification and localization via multi-label ranking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17627–17637, 2024, doi: 10.1109/CVPR52733.2024.01669.

D. Dagar and D. K. Vishwakarma, “A Hybrid Xception-LSTM Model with Channel and Spatial Attention Mechanism for Deepfake Video Detection,” 2023 3rd International Conference on Mobile Networks and Wireless Communications (ICMNWC), pp. 1-5, 2023, doi: 10.1109/ICMNWC60182.2023.10435983. [

T. Kanwal, R. Mahum, A. AlSalman, M. Sharaf, and H. Hassan, “Fake speech detection using VGGish with attention block,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2024, no. 35, 2024, doi: 10.1186/s13636-024-00348-4.

A. Khormali and J.-S. Yuan, “ADD: Attention-based DeepFake detection approach,” Big Data Cogn. Comput., vol. 5, no. 49, pp. 1–15, 2021, doi: 10.3390/bdcc5040049.

Y. Liu, Y. Chen, W. Dai, M. Gou, C. -T. Huang and H. Xiong, “Source-Free Domain Adaptation With Domain Generalized Pretraining for Face Anti-Spoofing,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5430-5448, 2024, doi: 10.1109/TPAMI.2024.3370721.

X. Zhang, J. Yi, C. Wang, C. Y. Zhang, S. Zeng, and J. Tao, “What to remember: Self-adaptive continual learning for audio deepfake detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, pp. 19569–19577, 2024, doi: 10.1609/aaai.v38i17.29929.

H. Liu, Z. Tan, C. Tan, Y. Wei, J. Wang, and Y. Zhao, “Forgeryaware adaptive transformer for generalizable synthetic image detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10770–10780, 2024.

J. Zakkam, “CoDeiT: Contrastive Data-Efficient Transformers for Deepfake Detection,” in Lecture Notes in Computer Science, pp. 62–77, 2024, doi: 10.1007/978-3-031-78125-4 5.

C. de Weever, S. Wilczek, C. de Laat, and Z. Geradts, Deepfake detection through PRNU and logistic regression analyses, Technical report, University of Amsterdam, 2020.

M. S. Rana, M. N. Nobi, B. Murali and A. H. Sung, “Deepfake Detection: A Systematic Literature Review,” in IEEE Access, vol. 10, pp. 25494- 25513, 2022, doi: 10.1109/ACCESS.2022.3154404.

S. Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review,” in Journal of Biomedical Informatics, vol. 35, no. 5-6, pp. 352–359, 2002, doi: 10.1016/S1532-0464(03)00034-0.

S. Solaiyappan and Y. Wen, “Machine learning based medical image deepfake detection: A comparative study,” in Machine Learning with Applications, vol. 8, pp. 100298, 2022, doi: 10.1016/j.mlwa.2022.100298.

M. S. M. Altaei and others, “Detection of Deep Fake in Face Images Using Deep Learning,” in Wasit Journal of Computer and Mathematics Science, vol. 1, no. 4, pp. 60–71, 2022, doi: 10.31185/wjcms.v1.i4.7.

A. H. Setyaningrum, A. E. Saputro, and others, “Deepfake Video Classification Using Random Forest and Stochastic Gradient Descent with Triplet Loss Approach Algorithm,” in Proc. 2024 12th International Conference on Cyber and IT Service Management (CITSM), pp. 1–6, 2024, doi: 10.1109/CITSM58677.2024.10471234.

N. Chakravarty and M. Dua, “A lightweight feature extraction technique for deepfake audio detection,” in Multimedia Tools and Applications, vol. 83, pp. 67443–67467, 2024, doi: 10.1007/s11042-024-18217-9.

M. T. Islam, I. H. Lee, A. I. Alzahrani, and K. Muhammad, “MEXFIC: A meta ensemble eXplainable approach for AI-synthesized fake image classification,” Alexandria Engineering Journal, vol. 116, pp. 351–363, 2025, doi: 10.1016/j.aej.2024.12.031.

K.-H. Moon, S.-Y. Ok, and S.-H. Lee, “SupCon-MPL-DP: Supervised Contrastive Learning with Meta Pseudo Labels for Deepfake Image Detection,” Applied Sciences, vol. 14, no. 8, pp. 3249, 2024, doi: 10.3390/app14083249.

J. Laakkonen, Domain-Augmented Meta-Learning for Generalizable Deepfake Detection, University of Eastren Finland, 2024.

A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–11, 2019, doi: 10.48550/arXiv.1901.08971.

L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face X-ray for more general face forgery detection,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5001–5010, 2020.

Y. Li, M.-C. Chang, and S. Lyu, “In ictu oculi: Exposing AIgenerated fake face videos by detecting eye blinking,” arXiv, 2018, doi: 10.48550/arXiv.1806.02877.

G. Ciocca and R. Schettini, “An innovative algorithm for key frame extraction in video summarization,” Journal of Real-Time Image Processing, vol. 1, pp. 69–88, 2006, doi: 10.1007/s11554-006-0001-1.

X. Liu, Y. Yu, X. Li and Y. Zhao, “MCL: Multimodal Contrastive Learning for Deepfake Detection,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2803-2813, 2024, doi: 10.1109/TCSVT.2023.3312738.

T. Qiao, S. Xie, Y. Chen, F. Retraint and X. Luo, “Fully Unsupervised Deepfake Video Detection Via Enhanced Contrastive Learning,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 7, pp. 4654-4668, 2024, doi: 10.1109/TPAMI.2024.3356814.

B. Wang, X. Yue, Y. Liu, K. Hao, Z. Li, and X. Zhao, “A Dynamic Trust Model for Underwater Sensor Networks Fusing Deep Reinforcement Learning and Random Forest Algorithm,” Applied Sciences, vol. 14, no. 8, pp. 3374, 2024, doi: 10.3390/app14083374.

T. J. Reddy, M. S. Ganesh, M. H. Kumar Reddy, C. Bhandhavya and R. Jansi, “Deep Learning-Powered Face Detection and Recognition for Challenging Environments,” 2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), pp. 1453-1459, 2024, doi: 10.1109/IDCIoT59759.2024.10467753.

Z. Yan, Y. Zhang, X. Yuan, S. Lyu, and B. Wu, “DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection,” arXiv, 2023, doi: 10.48550/arXiv.2307.01426.

S. Fung, X. Lu, C. Zhang, and C.-T. Li, “Deepfake detection via unsupervised contrastive learning,” in 2021 international joint conference on neural networks (IJCNN), pp. 1–8, 2021, doi: 10.48550/arXiv.2104.11507.

B. N. Subudhi and others, “Adaptive Meta-Learning for Robust Deepfake Detection: A Multi-Agent Framework to Data Drift and Model Generalization,” in arXiv, 2024, doi: 10.48550/arXiv.2411.08148.

Y.-K. Lin and T.-Y. Yen, “A Meta-Learning Approach for Few-Shot Face Forgery Segmentation and Classification,” in Sensors, vol. 23, no. 7, pp. 3647, 2023, doi: 10.3390/s23073647.

J. Zakkam, “CoDeiT: Contrastive Data-Efficient Transformers for Deepfake Detection,” in Lecture Notes in Computer Science, pp. 62–77, 2024, doi: 10.1007/978-3-031-78125-4_5.

S. Fung, X. Lu, C. Zhang, and C.-T. Li, “Deepfake detection via unsupervised contrastive learning,” in 2021 international joint conference on neural networks (IJCNN), pp. 1–8, 2021, doi: 10.48550/arXiv.2104.11507.

L. Baraldi, “Contrasting Deepfakes Diffusion via Contrastive Learning,” Computer Vision – ECCV 2024, pp. 199–216, 2024, doi: 10.1007/978-3- 031-73036-8_12.

G. Naskar, Sk. Mohiuddin, S. Malakar, E. Cuevas, and R. Sarkar, “Deepfake detection using deep feature stacking and meta-learning,” Heliyon, vol. 10, no. 4, 2024, doi: 10.1016/j.heliyon.2024.e25933.

M. M. Ghazi and H. K. Ekenel, “Performance Analysis on Deep Fake Detection,” IBIMA Publishing, vol. 2024, 2024, doi: 10.5171/2024.457767.

D. Wodajo, S. Atnafu, and Z. Akhtar, “Deepfake video detection using generative convolutional vision transformer,” arXiv, 2023, doi: 10.48550/arXiv.2307.07036.

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.