Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

Amelia Ritahani Ismail; Nadzurah Zainal Abidin; Mhd Khaled Maen

doi:10.18196/jrc.v3i2.13133

Authors

Amelia Ritahani Ismail International Islamic University Malaysia
Nadzurah Zainal Abidin International Islamic University Malaysia
Mhd Khaled Maen

DOI:

https://doi.org/10.18196/jrc.v3i2.13133

Keywords:

Review, Missing Data Imputation, Machine Learning, Healthcare

Abstract

Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends.

References

E. Rahm and Hong Hai Do, “Data Cleaning : Problems and Current Approaches,” IEEE Data Eng. Bull., no. January 2000, pp. 1–11, 2000.

M. R. Vinutha and J. Chandrika, “Imputation as a technique for enhancing the quality of medical data,” Int. J. Curr. Res. Rev., vol. 13, no. 5, pp. 91–95, 2021.

M. Al Khaldy and C. Kambhampati, “Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset,” Proc. SAI Intell. Syst. Conf. 2016.

Y. Usharani and P. Sammulal, “A novel approach for imputation of missing values for mining medical datasets,” 2015 IEEE Int. Conf. Comput. Intell. Comput. Res. ICCIC 2015, 2016.

Y. Zhu and X. Duan, “Predictive nursing helps improve treatment efficacy, treatment compliance, and quality of life in unstable angina pectoris patients,” Am. J. Transl. Res., vol. 13, no. 4, pp. 3473–3479, 2021.

A. Wang, H. Lim, S. Y. Cheng, and L. Xie, “ANTENNA, a Multi-Rank, Multi-Layered Recommender System for Inferring Reliable Drug-Gene-Disease Associations: Repurposing Diazoxide as a Targeted Anti-Cancer Therapy,” IEEE/ACM Trans. Comput. Biol. Bioinforma., 2018.

N. A. B. Kamisan, M. H. Lee, A. G. Hussin, and Y. Z. Zubairi, “Imputation techniques for incomplete load data based on seasonality and orientation of the missing values,” Sains Malaysiana, vol. 49, no. 5, pp. 1165–1174, 2020.

U. R. Yelipe, S. Porika, and M. Golla, “An efficient approach for imputation and classification of medical data values using class-based clustering of medical records,” Comput. Electr. Eng., vol. 66, pp. 487–504, 2018.

Y. Usharani and P. Sammulal, “A novel approach for imputation of missing values for mining medical datasets,” 2015 IEEE Int. Conf. Comput. Intell. Comput. Res. ICCIC 2015, 2016.

N. Z. Zainal Abidin, A. R. Ismail, and N. A. Emran, “Performance Analysis of Machine Learning Algorithms for Missing Value Imputation,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 6, 2018.

N. A. M. Pauzi, Y. B. Wah, S. M. Deni, S. K. N. A. Rahim, and Suhartono, “Comparison of single and mice imputation methods for missing values: A simulation study,” Pertanika J. Sci. Technol., vol. 29, no. 2, pp. 979–998, 2021.

L. Bargelloni et al., “Data imputation and machine learning improve association analysis and genomic prediction for resistance to fish photobacteriosis in the gilthead sea bream,” Aquac. Reports, vol. 20, p. 100661, 2021.

I. Erlyn and W. Rachmawan, “Optimization of Missing Value Imputation using Reinforcement Programming,” Int. Electron. Symp. (IES), IEEE, pp. 128–133, 2015.

C. Platias and G. Petasis, “A comparison of machine learning methods for data imputation,” PervasiveHealth Pervasive Comput. Technol. Healthc., pp. 150–159, 2020.

E. Thomas, T. and Rajabi, “A systematic review of machine learning-based missing value imputation techniques,” Data Technol. Appl., vol. 55, no. 4, pp. 558–585, 2021.

O. F. Ayilara, L. Zhang, T. T. Sajobi, R. Sawatzky, E. Bohm, and L. M. Lix, “Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry,” Health Qual. Life Outcomes, vol. 17, no. 1, pp. 1–9, 2019.

S. Rostami, A. Kleszcz, D. Dimanov, and V. Katos, “A machine learning approach to dataset imputation for software vulnerabilities,” Commun. Comput. Inf. Sci., vol. 1284 CCIS, no. September, pp. 25–36, 2020.

A. W. Lo, K. W. Siah, and C. H. Wong, “Machine Learning with Statistical Imputation for Predicting Drug Approval,” Harvard Data Sci. Rev., no. 1, 2019.

A. Bhattacharjee and M. S. Bayzid, “Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices,” BMC Genomics, vol. 21, no. 1, pp. 1–14, 2020.

Y. C. Su, C. Y. Wu, C. H. Yang, B. S. Li, S. H. Moi, and Y. Da Lin, “Machine learning data imputation and prediction of foraging group size in a Kleptoparasitic spider,” Mathematics, vol. 9, no. 4, pp. 1–16, 2021.

N. Solomon, Y. Lokhnygina, and S. Halabi, “Comparison of regression imputation methods of baseline covariates that predict survival outcomes,” J. Clin. Transl. Sci., vol. 5, no. 1, 2021.

M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudre´Mauroux, “Mind the gap: An experimental evaluation of imputation of missing values techniques in time series,” Proc. VLDB Endow., vol. 13, no. 5, pp. 768–782, 2020.

L. Huang, C. Wang, and N. A. Rosenberg, “The Relationship between Imputation Error and Statistical Power in Genetic Association Studies in Diverse Populations,” Am. J. Hum. Genet., vol. 85, no. 5, pp. 692–698, 2009.

R. A. Hughes, J. Heron, J. A. C. Sterne, and K. Tilling, “Accounting for missing data in statistical analyses: Multiple imputation is not always the answer,” Int. J. Epidemiol., vol. 48, no. 4, pp. 1294–1304, 2019.

B. O. Petrazzini, H. Naya, F. Lopez-Bello, G. Vazquez, and L. Spangenberg, “Evaluation of different approaches for missing data imputation on features associated to genomic data,” BioData Min., vol. 14, no. 1, pp. 1–13, 2021.

S. Siafis et al., “Imputing the number of responders from the mean and standard deviation of CGI-improvement in clinical trials investigating medications for autism spectrum disorder,” Brain Sci., vol. 11, no. 7, 2021.

J. B. Hardouin, R. Conroy, and V. Sébille, “Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data,” BMC Med. Res. Methodol., vol. 11, pp. 1–13, 2011.

L. Malan, C. M. Smuts, J. Baumgartner, and C. Ricci, “Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns,” Nutr. Res., vol. 75, pp. 67–76, 2020.

H. M. K. Ghomrawi, L. A. Mandl, J. Rutledge, M. M. Alexiades, and M. Mazumdar, “Is there a role for expectation maximization imputation in addressing missing data in research using WOMAC questionnaire? Comparison to the Standard mean approach and a tutorial,” BMC Musculoskelet. Disord., vol. 12, no. 1, p. 109, 2011.

Y. Liu and V. Gopalakrishnan, “An overview and evaluation of recent machine learning imputation methods using cardiac imaging data,” Data, vol. 2, no. 1, 2017.

S. Javadi, A. Bahrampour, M. M. Saber, B. Garrusi, and M. R. Baneshi, “Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable,” J. Probab. Stat., vol. 2021, pp. 1–14, 2021.

S. I. Chowdhury, M. H., Islam, M. K., & Khan, “Imputation of Missing Healthcare Data,” Comput. Inf. Technol. (ICCIT), 2017 20th Int. Conf., pp. 1–6, 2017.

L. A. Kahale et al., “Potential impact of missing outcome data on treatment effects in systematic reviews: Imputation study,” BMJ, vol. 370, pp. 1–10, 2020.

T. Köse, S. Özgür, E. Coşgun, A. Keskinoǧlu, P. Keskinoǧlu, and D. Mrozek, “Effect of Missing Data Imputation on Deep Learning Prediction Performance for Vesicoureteral Reflux and Recurrent Urinary Tract Infection Clinical Study,” Biomed Res. Int., vol. 2020, 2020.

R. Wijesuriya, M. Moreno-Betancur, J. B. Carlin, and K. J. Lee, “Evaluation of approaches for multiple imputation of three-level data,” BMC Med. Res. Methodol., vol. 20, no. 1, pp. 1–15, 2020.

H. Hegde, N. Shimpi, A. Panny, I. Glurich, P. Christie, and A. Acharya, “MICE vs PPCA: Missing data imputation in healthcare,” Informatics Med. Unlocked, vol. 17, no. November, p. 100275, 2019.

I. Jordanov, N. Petrov, and A. Petrozziello, “Classifiers Accuracy Improvement Based on Missing Data Imputation,” J. Artif. Intell. Soft Comput. Res., vol. 8, no. 1, pp. 31–48, 2018.

M. Kokla, J. Virtanen, M. Kolehmainen, J. Paananen, and K. Hanhineva, “Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: A comparative study,” BMC Bioinformatics, vol. 20, no. 1, pp. 1–11, 2019.

T. Etoeharnowo and M. H. J. A. Van Os, “A Random Forest Approach for Dealing with Missingness: a Case Study in Primary Care Data,” Leidin Inst. Adv. Comput. Sci., 2020.

S. Hong and H. S. Lynn, “Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction,” BMC Med. Res. Methodol., vol. 20, no. 1, pp. 1–12, 2020.

S. M. Mostafa, “Imputing missing values using cumulative linear regression,” CAAI Trans. Intell. Technol., vol. 4, no. 3, pp. 182–200, 2019.

T. R. Sivapriya, “Imputation And Classification Of Missing Data Using Least Square Support Vector Machines – A New Approach In Dementia Diagnosis,” IJARAI - Int. J. Adv. Res. Artif. Intell., vol. 1, no. 4, pp. 29–34, 2012.

M. Rahman and D. N. Davis, “Machine Learning Based Missing Value Imputation Method for Clinical Datasets,” IAENG Trans. Eng. Technol. Springer Netherlands, vol. 247, no. January, 2013.

S. P. Mandel J, “A Comparison of Six Methods for Missing Data Imputation,” J. Biom. Biostat., vol. 6, no. 1, pp. 1–6, 2015.

W. Seffens et al., “Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study,” Bioinform. Biol. Insights, vol. 9s3, pp. 43–54, 2015.

T. Razzaghi, O. Roderick, I. Safro, and N. Marko, “Fast Imbalanced Classification of Healthcare Data with Missing Values,” pp. 1–13, 2015.

M. W. Huang, W. C. Lin, and C. F. Tsai, “Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets,” J. Healthc. Eng., vol. 2018, 2018.

J. Bektaş, T. Ibrikçi, and İ. T. Özcan, “The impact of imputation procedures with machine learning methods on the performance of classifiers: An application to coronary artery disease data including missing values,” Biomed. Res., vol. 29, no. 13, pp. 2780–2785, 2018.

C. Cheng and H. Huang, “A Distance-Threshold kNN Method for Imputing Medical Data Missing Values,” vol. 7, no. 1, pp. 13–17, 2019.

S. F. Huang and C. H. Cheng, “A safe-region imputation method for handling medical data with missing values,” Symmetry MDPI, vol. 12, no. 11, pp. 1–19, 2020.

T. Rockel, D. W. Joenssen, and U. Bankhofer, “Decision Trees for the Imputation of Categorical Data,” Kit Sci. Publ. , vol. 2, no. 1, pp. 1–15, 2017.

A. Ameta and K. Jain, “Data Mining Techniques for the Prediction of Kidney Diseases and Treatment: A Review,” Int. J. Eng. Comput. Sci., vol. 6, no. 2, pp. 20376–20378, 2017.

S. Nair, J. L. De La Vara, M. Sabetzadeh, and L. Briand, “An extended systematic literature review on provision of evidence for safety certification,” Inf. Softw. Technol., vol. 56, no. 7, pp. 689–717, 2014.

B. Kitchenham and S. Charters, “Guidelines for performing Systematic Literature Reviews in Software Engineering,” Engineering, vol. 2, p. 1051, 2007.

S. Bose, C. Das, T. Gangopadhyay, and S. Chattopadhyay, “A modified local least squares-based missing value estimation method in microarray gene expression data,” Int. Conf. Adv. Comput. Netw. Secur., pp. 18–23, 2013.

Y. Y. Choi, H. Shon, Y. J. Byon, D. K. Kim, and S. Kang, “Enhanced application of principal component analysis in machine learning for imputation of missing traffic data,” Appl. Sci., vol. 9, no. 10, pp. 1–15, 2019.

X. Su, R. Greiner, T. M. Khoshgoftaar, and A. Napolitano, “Using classifier-based nominal imputation to improve machine learning,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6634 LNAI, no. PART 1, pp. 124–135, 2011.

F. D. Atem, E. Sampene, and T. J. Greene, “Improved conditional imputation for linear regression with a randomly censored predictor,” Stat. Methods Med. Res., vol. 28, no. 2, pp. 432–444, 2019.

A. Sundararajan and A. I. Sarwat, “Evaluation of Missing Data Imputation Methods for an Enhanced Distributed PV Generation Prediction,” Proc. Futur. Technol. Conf. 2019, pp. 590–609, 2AD.

M. Alber et al., “Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences,” npj Digit. Med., vol. 2, no. 1, 2019.

M. G. Signorini, N. Pini, A. Malovini, R. Bellazzi, and G. Magenes, “Integrating machine learning techniques and physiology based heart rate features for antepartum fetal monitoring,” Comput. Methods Programs Biomed., vol. 185, 2020.

S. Nikfalazar, C. H. Yeh, S. Bedingfield, and H. A. Khorshidi, “Missing data imputation using decision trees and fuzzy clustering with iterative learning,” Knowl. Inf. Syst., vol. 62, no. 6, pp. 2419–2437, 2020.

P. Keerin and W. Kurutach, “An Improvement of Missing Value Imputation in DNA Microarray Data Using Cluster-based LLS Method,” 2013 13th Int. Symp. Commun. Inf. Technol., pp. 559–564, 2013.

M. F. Dzulkalnine and R. Sallehuddin, “Missing data imputation with fuzzy feature selection for diabetes dataset,” SN Appl. Sci., vol. 1, no. 4, pp. 1–12, 2019.

L. E. Chai et al., “Investigating the effects of imputation methods for modelling gene networks using a dynamic Bayesian network from gene expression data,” Malaysian J. Med. Sci., vol. 21, no. 2, pp. 20–27, 2014.

I. Wasito and B. Mirkin, “Nearest neighbours in least-squares data imputation algorithms with different missing patterns,” Comput. Stat. Data Anal., vol. 50, no. 4, pp. 926–949, 2006.

S. Saha, S. Bandopadhyay, A. Ghosh, and K. N. Dey, “An improved fuzzy based approach to impute missing values in DNA microarray gene expression data with collaborative filtering,” 2016 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2016, pp. 911–916, 2016.

Pedro J. Garcia-Laencina, A. R. F. Vidal, and J.-L. Sancho-Gomez, “A Robust Approach For Classifying Unknown Data in Medical Diagnosis Problems,” in 2008 World Automation Congress, 2008.

E. Tavazzi, S. Daberdaku, R. Vasta, A. Calvo, A. Chiò, and B. Di Camillo, “Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach,” BMC Med. Inform. Decis. Mak., vol. 20, no. Suppl 5, pp. 1–23, 2020.

P. J. García-Laencina, J. L. Sancho-Gómez, A. R. Figueiras-Vidal, and M. Verleysen, “K-nearest neighbours based on mutual information for incomplete data classification,” ESANN 2008 Proceedings, 16th Eur. Symp. Artif. Neural Networks - Adv. Comput. Intell. Learn., no. May 2014, pp. 37–42, 2008.

P. J. García-Laencina, J. L. Sancho-Gómez, A. R. Figueiras-Vidal, and M. Verleysen, “K nearest neighbours with mutual information for simultaneous classification and missing data imputation,” Neurocomputing, vol. 72, no. 7–9, pp. 1483–1493, 2009.

V. Kumutha and S. Palaniammal, “An Enhanced Approach on Handling Missing Values Using Bagging k-NN Imputation,” 2013 Int. Conf. Comput. Commun. Informatics, pp. 1–8, 2013.

K. Mehrabani-Zeinabad, M. Doostfatemeh, and S. M. T. Ayatollahi, “An efficient and effective model to handle missing data in classification,” Biomed Res. Int., vol. 2020, 2020.

C. T. Tran, M. Zhang, P. Andreae, and B. Xue, “Multiple Imputation and Ensemble Learning for Classification with Incomplete Data,” Intell. Evol. Syst., vol. 187, no. December 2017, 2009.

L. Nanni, A. Lumini, and S. Brahnam, “A classifier ensemble approach for the missing feature problem,” Artif. Intell. Med., vol. 55, no. 1, pp. 37–50, 2012.

D. An, R. J. A. Little, and J. W. McNally, “A multiple imputation approach to disclosure limitation for high-age individuals in longitudinal studies,” Stat. Med., vol. 29, no. 17, pp. 1769–1778, 2010.

C. G. Schuetz, “Using neuroimaging to predict relapse to smoking: role of possible moderators and mediators.,” Int. J. Methods Psychiatr. Res., vol. 17 Suppl 1, no. 1, pp. S78–S82, 2008.

B. Conroy, L. Eshelman, C. Potes, and M. Xu-Wilson, “A dynamic ensemble approach to robust classification in the presence of missing data,” Mach. Learn., vol. 102, no. 3, pp. 443–463, 2016.

S. S. Khan, A. Ahmad, and A. Mihailidis, “Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data,” Cornell Univ., no. Mi, pp. 1–17, 2016.

H. De Silva and A. S. Perera, “Missing Data Imputation using Evolutionary k-Nearest Neighbor Algorithm for Gene Expression Data,” pp. 141–146, 2016.

N. Kamiura, A. Ohtsuka, H. Tanii, T. Isokawa, and N. Matsui, “On Detection of Hematopoietic Tumors Using Self Organizing Maps and Genetic Algorithms,” in 2005 IEEE International Conference on Systems, Man and Cybernetics, 2005.

M. Priya and P. R. Kumar, “Intelligent Approaches for Prognosticating Atherosclerotic and Non-Atherosclerotic Individuals,” 2014 Int. Conf. Commun. Signal Process., pp. 691–695, 2014.

P. Almasinejad, A. Golabpour, M. Reza, M. Meybodi, K. Mirzaie, and A. Khosravi, “A Dynamic Model for Imputing Missing Medical Data : A Multiobjective Particle Swarm Optimization Algorithm,” J. Healthc. Eng., vol. 2021, no. i, 2021.

X. Wang, W. Li, Y. Sun, S. Milanovic, M. Kon, and J. E. Castrillon-Candas, “Multilevel Stochastic Optimization for Imputation in Massive Medical Data Records,” arXiv, 2021.

A. R. Ismail, N. A. Aziz, A. M. Ralib, N. Z. Abidin, and S. S. Bashath, “A particle swarm optimization levy flight algorithm for imputation of missing creatinine dataset,” Int. J. Adv. Intell. Informatics, vol. 7, no. 2, pp. 225–236, 2021.

J. W. and C. L. Anderson, Deborah K., Liang, “An imputation-regularized optimization algorithm for high dimensional missing data problems and beyond,” Physiol. Behav., vol. 176, no. 5, pp. 139–148, 2017.

C. Ke et al., “Prognostics of surgical site infections using dynamic health data,” J. Biomed. Inform., vol. 65, pp. 22–33, 2017.

V. S. H. Rao, S. Member, and M. N. Kumar, “Novel Approaches for Predicting Risk Factors of Atherosclerosis,” IEEE J. Biomed. Heal. Informatics, vol. 17, no. 1, pp. 183–189, 2013.

M. Najib and N. A. Samat, “FCMPSO : An Imputation for Missing Data Features in Heart Disease Classification,” IOP Conf. Ser. Mater. Sci. Eng., 2017.

A. Nekouie and M. H. Moattar, “Missing value imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive particle swarm optimization,” J. King Saud Univ. - Comput. Inf. Sci., 2018.

M. Suresh, R. Taib, Y. Zhao, and W. Jin, “Sharpening the BLADE: Missing Data Imputation Using Supervised Machine Learning,” Data Integr. Partnersh. Aust., no. July, pp. 215–227, 2019.

S. I. Khan and A. S. M. L. Hoque, “SICE: an improved missing data imputation technique,” J. Big Data, vol. 7, no. 1, 2020.

Y. Guo, Z. Liu, P. Krishnswamy, and S. Ramasamy, “Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series,” arXiv, 2019.

K. Skivington et al., “A new framework for developing and evaluating complex interventions: update of Medical Research Council guidance,” BMJ Open, no. 2018, p. n2061, 2021.

Q. Zhao and D. Lee, “HICCUP : Hierarchical Clustering Based Value Imputation using Heterogeneous Gene Expression Microarray Datasets,” 2007.

D. Bertsimas, A. Orfanoudaki, and C. Pawlowski, Imputation of clinical covariates in time series, vol. 110, no. 1. Springer US, 2020.

R. Kumar, T. Chen, M. Hardt, D. Beymer, K. Brannon, and T. Syeda-Mahmood, “Multiple Kernel Completion and its application to cardiac disease discrimination,” Proc. - Int. Symp. Biomed. Imaging, pp. 764–767, 2013.

T. Le, T. Altman, and K. J. Gardiner, “Probability-based Imputation Method for Fuzzy Cluster Analysis of Gene Expression Microarray Data,” 2012 Ninth Int. Conf. Inf. Technol. - New Gener., pp. 42–47, 2012.

M. Liu, Y. Gao, P. T. Yap, and D. Shen, “Multi-Hypergraph Learning for Incomplete Multimodality Data,” IEEE J. Biomed. Heal. Informatics, vol. 22, no. 4, pp. 1197–1208, 2018.

P. Valarmathie and K. Dinakaran, “An efficient technique for missing value imputation in microarray gene expression data,” no. Icccs 114, pp. 073–080, 2014.

X. Wu, H. Akbarzadeh Khorshidi, U. Aickelin, Z. Edib, and M. Peate, “Imputation techniques on missing values in breast cancer treatment and fertility data,” Heal. Inf. Sci. Syst., vol. 7, no. 1, pp. 1–8, 2019.

M. O. Prates, “Spatial extreme learning machines: An application on prediction of disease counts,” Stat. Methods Med. Res., p. 96228021876798, 2018.

L. Jin et al., “A comparative study of evaluating missing value imputation methods in label-free proteomics,” Sci. Rep., vol. 11, no. 1, pp. 1–11, 2021.

M. Zitnik et al., “Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities,” HHS Public Access, no. 50, pp. 71–91, 2020.

A. Ghandeharioun et al., “Objective Assessment of Depressive Symptoms with Machine Learning and Wearable Sensors Data,” 2017 Seventh Int. Conf. Affect. Comput. Intell. Interact., pp. 325–332, 2017.

C. Velasco-Gallego and I. Lazakis, “Real-time data-driven missing data imputation for short-term sensor data of marine systems. A comparative study,” Ocean Eng., vol. 218, no. July, p. 108261, 2020.

M. A. H., N. D. Nur, N. Md Tahir, Z. Iffah Abd Latiff, M. Huzaimy Jusoh, and Y. Akimasa, “Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models,” Alexandria Eng. J., vol. 61, no. 1, pp. 937–947, 2022.

O. A. Alade, R. Sallehuddin, and A. Selamat, “Empirical Performance Evaluation of Imputation Techniques using Medical Dataset,” IOP Conf. Ser. Mater. Sci. Eng., vol. 551, no. 1, pp. 0–5, 2019.

M. Kshirsagar, J. Carbonell, and J. Klein-Seetharaman, “Techniques to cope with missing data in host-pathogen protein interaction prediction,” Bioinformatics, vol. 28, no. 18, pp. 466–472, 2012.

E. Kontopantelis, I. R. White, M. Sperrin, and I. Buchan, “Outcome-sensitive multiple imputation: A simulation study,” BMC Med. Res. Methodol., vol. 17, no. 1, pp. 1–13, 2017.

V. F. Ghoneim, N. H. Solouma, and Y. M. Kadah, “Evaluation of missing values imputation methods in cDNA microarrays based on classification accuracy,” 2011 1st Middle East Conf. Biomed. Eng. MECBME 2011, pp. 367–370, 2011.

C. Chang, Y. Deng, X. Jiang, and Q. Long, “Multiple imputation for analysis of incomplete data in distributed health data networks,” Nat. Commun., vol. 11, no. 1, pp. 1–11, 2020.

V. F. Ghoneim, N. H. Solouma, and Y. M. Kadah, “The Impact of Missing Values Imputation Methods in cDNA Microarrays on Downstream Data Analysis,” in 28th NATIONAL RADIO SCIENCE CONFERENCE 28th NATIONAL RADIO SCIENCE CONFERENCE, 2011, no. Nrsc.

C. Y. Guo, Y. C. Yang, and Y. H. Chen, “The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model,” Front. Public Heal., vol. 9, no. July, pp. 1–8, 2021.

A. Colubri, T. Silver, T. Fradet, K. Retzepi, B. Fry, and P. Sabeti, “Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients,” PLoS Negl. Trop. Dis., vol. 10, no. 3, pp. 1–17, 2016.

M. Mitra and R. K. Samanta, “A Study on UCI Hepatitis Disease Dataset Using Soft Computing,” AMSE Journals IIETA, vol. 78, pp. 467–477, 2017.

M.-K. Suh, J. Woodbridge, M. Lan, A. Bui, L. S. Evangelista, and M. Sarrafzadeh, “Missing Data Imputation for Remote CHF Patient Monitoring Systems,” Conf. Proc. IEEE Eng. Med. Biol. Soc. NIH Public Access, pp. 3184–3187, 2012.

D. M. Hondula et al., “A respiratory alert model for the Shenandoah Valley, Virginia, USA,” Int. J. Biometeorol., vol. 57, no. 1, pp. 91–105, 2013.

W. C. Lin and C. F. Tsai, “Missing value imputation: a review and analysis of the literature (2006–2017),” Artif. Intell. Rev., vol. 53, no. 2, pp. 1487–1509, 2020.

S. Phung, A. Kumar, and J. Kim, “A deep learning technique for imputing missing healthcare data,” Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBS, pp. 6513–6516, 2019.

J. M. Jerez et al., “Missing data imputation using statistical and machine learning methods in a real breast cancer problem,” Artif. Intell. Med., vol. 50, no. 2, pp. 105–115, 2010.

K. Grace-Martin, “Limitations of Common Solutions to Missing Data,” Cornell Statistical Consulting Unit, no. November, 2001.

J. A. Saunders, N. Morrow-howell, E. Spitznagel, P. Doré, E. K. Proctor, and R. Pescarino, “Imputing Missing Data: A Comparison of Methods for Social Work Researchers,” in Social Work Research, 2006, pp. 19–31.

M. H. Huque, J. B. Carlin, J. A. Simpson, and K. J. Lee, “A comparison of multiple imputation methods for missing data in longitudinal studies 01 Mathematical Sciences,” BMC Med. Res. Methodol., vol. 18, no. 1, pp. 1–16, 2018.

A. K. Waljee et al., “Comparison of imputation methods for missing laboratory data in medicine,” BMJ Open, vol. 3, no. 8, pp. 1–8, 2013.

M. Soley-bori, “Dealing with missing data: Key assumptions and methods for applied analysis,” PM931 Dir. Study Heal. Policy Manag., no. 4, p. 20, 2013.

S. Haji-maghsoudi, A. Haghdoost, A. Rastegari, and M. R. Baneshi, “Influence of Pattern of Missing Data on Performance of Imputation Methods : An Example Using National Data on Drug Injection in Prisons,” Int. J. Heal. Policy Manag., vol. 1, no. 1, pp. 69–77, 2013.

S. N. Payrovnaziri, A. Xing, S. Salman, X. Liu, J. Bian, and Z. He, “The Impact of Imputation on the Interpretations of Prediction Models: A Case Study on Mortality Prediction for Patients with Acute Myocardial Infarction,” AMIA ... Annu. Symp. proceedings. AMIA Symp., vol. 2021, pp. 465–474, 2021.

D. B. Rubin and R. J. A. Little, Statistical Analysis with Missing Data, Second Edi. New York: A John Wiley & Sons, Inc., Publication, 2002.

H. Kang, “The prevention and handling of the missing data,” Korean J. Anesthesiol., vol. 64, no. 5, pp. 402–406, 2013.

B. Leurent, M. Gomes, S. Cro, N. Wiles, and J. R. Carpenter, “Reference-based multiple imputation for missing data sensitivity analyses in trial-based cost-effectiveness analysis,” Heal. Econ. (United Kingdom), vol. 29, no. 2, pp. 171–184, 2020.

A. B. Pedersen et al., “Missing data and multiple imputation in clinical epidemiological research,” Clin. Epidemiol., vol. 9, pp. 157–166, 2017.

M. R.-R. Jacques-Emmanuel Galimard, Sylvie Chevret, Camelia Protopopescu, “A multiple imputation approach for MNAR mechanisms compatible with Heckman’s model,” Wiley Online Libr., 2016.

M. Pampaka, G. Hutcheson, and J. Williams, “Handling missing data : analysis of a challenging data set using multiple imputation,” Int. J. Res. Method Educ., vol. 7288, no. 39:1, pp. 19–37, 2016.

M. N. N. Ramli, A. S. Yahaya, N. A. Ramli, N. F. F. . Yusof, and M. M. A. Abdullah, “Roles of Imputation Methods for Filling the Missing Values : A Review,” in International Conference of Advanced Materials Engineering and Technology (ICAMET 2013), 2013, no. November.

M. G. Kenward, “The handling of missing data in clinical trials,” Clin. Investig. (Lond)., vol. 3, no. 3, pp. 241–250, 2013.

D. C. Howell, “The Treatment of Missing Data,” pp. 1–44, 2000.

M. Mera-Gaona, U. Neumann, R. Vargas-Canas, and D. M. López, “Evaluating the impact of multivariate imputation by MICE in feature selection,” PLoS One, vol. 16, no. 7 July, pp. 1–28, 2021.

G. Wang, Z. Deng, and K.-S. Choi, “Tackling missing data in community health studies using additive LS-SVM classifier,” IEEE J. Biomed. Heal. Informatics, vol. 22, no. 2, pp. 1–1, 2016.

M. S. B. Sehgal, I. Gondal, and L. Dooley, “K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification,” in Fourth International Conference on Hybrid Intelligent Systems (HIS’04), 2001, pp. 8–13.

M. S. B. Sehgal, I. Gondal, and L. S. Dooley, “Gene expression Collateral missing value imputation : a new robust missing value estimation algorithm for microarray data,” Bioinformatics, vol. 21, no. 10, pp. 2417–2423, 2005.

L. Beretta and A. Santaniello, “Nearest neighbor imputation algorithms : a critical evaluation,” BMC Med. Inform. Decis. Mak., vol. 16, no. Suppl 3, 2016.

E. Acuna and C. Rodriguez, “The treatment of missing values and its effect in the classifier accuracy,” in Proceedings of the Meeting of the International Federation of Classification Societies (IFCS), 2004, no. 1995, pp. 1–9.

W. C. W. Chen et al., “Combining Fourier and Lagged k-Nearest Neighbor Imputation for Biomedical Time Series Data,” J. Biomed. Informatics, vol. 33, no. 2, pp. 557–573, 2016.

J. Huang et al., “Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study,” J. Syst. Softw., vol. 132, pp. 226–252, 2017.

R. Samant and S. Rao, “Effects of Missing Data Imputation on Classifier Accuracy,” Int. J. Eng. Res. Technol., vol. 2, no. 11, pp. 264–266, 2013.

M. G. Rahman and M. Z. Islam, “A decision tree-based missing value imputation technique for data pre-processing,” Conf. Res. Pract. Inf. Technol. Ser., vol. 121, pp. 41–50, 2010.

Per Jönsson and Claes Wohlin, “An Evaluation of k -Nearest Neighbour Imputation Using Likert Data,” 10th Int. Symp. Softw. Metrics, 2004. Proc., 2004.

M. Askarian, G. Escudero, M. Graells, R. Zarghami, F. Jalali-Farahani, and N. Mostoufi, “Fault diagnosis of chemical processes with incomplete observations: A comparative study,” Comput. Chem. Eng., vol. 84, pp. 104–116, 2016.

M. G. Rahman and M. Z. Islam, “Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques,” Knowledge-Based Syst., vol. 53, pp. 51–65, 2013.