Ensemble learning with imbalanced data handling in the early detection of capital markets

Putri Auliana Rifqi Mukhlashin, Anwar Fitrianto, Agus M Soleh, Wan Zuki Azman Wan Muhamad

Abstract


Research aims: This study aims to create an early detection model to predict events in the Indonesian capital market.
Design/Methodology/Approach: A quantitative study comparing ensemble learning models with imbalanced data handling detected early capital market events. This study used five ensemble learning models—Random Forest, ExtraTrees, CatBoost, XGBoost, and LightGBM—to detect early events in the Indonesian capital market by handling imbalanced data, such as under sampling (RUS), oversampling (SMOTE, SMOTE-Broder, ADASYN), and over-under sampling (SMOTE-Tomek, SMOTE-ENN), weighted (class weight). Global and regional stock markets, commodities, exchange rates, technical indicators, sectoral indices, JCI leaders, MSCI, net buys of foreign stocks, national securities, and national share ownership all predicted the lowest return of Crisis Management Protocol (CMP) binary responses.
Research findings: Hyperparameters and thresholds were tuned to produce the optimum model. The best model had the highest G-mean. ExtraTrees with SMOTE-ENN predicted the highest number of one-day events, with a G-Mean of 96.88%. LightGBM with SMOTE handling best predicted five-day events with an 89.21% G-Mean. With a G-Mean of 89.49%, CatBoost with SMOTE-Border handling was the best for a 15-day event. In addition, LightGBM with SMOTE-Tomek handling and 68.02% G-Mean was best for 30-day events. Further, performance evaluation scores decreased with increased prediction time.
Theoretical contribution/Originality: This work relates more imbalance handling methods and ensemble learning to capital market early detection cases.
Practitioner/Policy implication: Capital markets can indicate economic stability. Maintaining capital market efficacy and economic value requires a system to detect pressure.
Research limitation/Implication: This study used ensemble learning models to predict capital market events 1, 5, 15, and 30 days ahead, assuming Indonesian working days. The model's forecast results are expected to be utilized to monitor the capital market and take precautions.


Keywords


Capital Market; Early Detection; Ensemble Learning; Imbalance Class; Risk Event

Full Text:

PDF

References


Aini, Q., Manongga, D., Rahardja, U., Sembiring, I., & Efendy, R. (2023). Innovation and Key Benefits of Business Models in Blockchain Companies. Blockchain Frontier Technology, 2(2), 24-35. https://doi.org/10.34306/bfront.v2i2.161

Alfian, G., Syafrudin, M., Fahrurrozi, I., Fitriyani, N. L., Atmaji, F. T. D., Widodo, T., ... & Rhee, J. (2022). Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers, 11(9), 136. https://doi.org/10.3390/computers11090136

Aly, S., Alfonse, M., & Salem, A. B. M. (2022). Intelligent Model for Enhancing the Bankruptcy Prediction with Imbalanced Data Using Oversampling and CatBoost. International Journal of Intelligent Computing and Information Sciences, 22(3), 92-108. https://doi.org/10.21608/ijicis.2022.105654.1138

Asundi, R. V., Prakash, R., & Kumar, K. (n.d.). Class Weight technique for Handling Class Imbalance.

Bintoro, B. P. K., Lutfiani, N. and Julianingsih, D. (2023) ‘Analysis of the Effect of Service Quality on Company Reputation on Purchase Decisions for Professional Recruitment Services’, APTISI Transactions on Management (ATM), 7(1), pp. 35–41. https://doi.org/10.33050/atm.v7i1.1736

Bluwstein, K., Buckmann, M., Joseph, A., Kapadia, S., & Simsek, Ö. (2021). Credit growth, the yield curve and financial crisis prediction: Evidence from a machine learning approach. ECB Working Paper No. 2021/2614. http://dx.doi.org/10.2139/ssrn.3969562

Candra, O., Chammam, A., Rahardja, U., Ramirez-Coronel, A. A., Al-Jaleel, A. A., Al-Kharsan, I. H., ... & Rezai, M. M. (2023). Optimal Participation of the Renewable Energy in Microgrids with Load Management Strategy. Environmental and Climate Technologies, 27(1), 56-66. https://doi.org/10.2478/rtuect-2023-0005

Carmona, P., Climent, F., & Momparler, A. (2019). Predicting failure in the US banking sector: An extreme gradient boosting approach. International Review of Economics & Finance, 61, 304-323. https://doi.org/10.1016/j.iref.2018.03.008

Chen, R. C., Dewi, C., Huang, S. W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 52. https://doi.org/10.1186/s40537-020-00327-4

Coffinet, J., & Kien, J. N. (2019). Detection of rare events: A machine learning toolkit with an application to banking crises. The Journal of Finance and Data Science, 5(4), 183-207. https://doi.org/10.1016/j.jfds.2020.04.001

Faris, H., Abukhurma, R., Almanaseer, W., Saadeh, M., Mora, A. M., Castillo, P. A., & Aljarah, I. (2020). Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market. Progress in Artificial Intelligence, 9, 31-53. https://doi.org/10.1007/s13748-019-00197-9

Gnip, P., & Drotár, P. (2019, September). Ensemble methods for strongly imbalanced data: bankruptcy prediction. In 2019 IEEE 17th International Symposium on Intelligent Systems and Informatics (SISY), 155-160. IEEE.

Google Developers. (2021). Machine Learning. https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data

Hariguna, T., Rahardja, U., & Sarmini. (2022). The Role of E-Government Ambidexterity as the Impact of Current Technology and Public Value: An Empirical Study. Informatics, 9(3), 67. https://doi.org/10.3390/informatics9030067

Hermawan, A., Sunaryo, W., & Hardhienata, S. (2023). Optimal Solution for OCB Improvement Through Strengthening of Servant Leadership, Creativity, and Empowerment. Aptisi Transactions on Technopreneurship (ATT), 5(1Sp), 11-21. https://doi.org/10.34306/att.v5i1Sp.307

Indrawati, A. (2021) ‘Penerapan Teknik Kombinasi Oversampling Dan Undersampling Untuk Mengatasi Permasalahan Imbalanced Dataset’, JIKO (Jurnal Informatika dan Komputer), 4(1), pp. 38–43. https://doi.org/10.33387/jiko.v4i1.2561

Islam, S. R., Eberle, W., Ghafoor, S. K., Bundy, S. C., Talbert, D. A., & Siraj, A. (2019). Investigating bankruptcy prediction models in the presence of extreme class imbalance and multiple stages of economy. arXiv preprint arXiv:1911.09858.

Jabeur, S. B., Gharib, C., Mefteh-Wali, S., & Arfi, W. B. (2021). CatBoost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166, 120658. https://doi.org/10.1016/j.techfore.2021.120658

Junyu, H. (2020, August). Prediction of Financial Crisis Based on Machine Learning. In 2020 The 4th International Conference on Business and Information Management , 71-75. https://doi.org/10.1145/3418653.3418674

Kosasi, S., Yuliani, I. D. A. E., & Rahardja, U. (2022, February). Boosting e-service quality of online product businesses through it leadership. In 2022 International Conference on Science and Technology (ICOSTECH), 1-10. IEEE. 10.1109/ICOSTECH54296.2022.9829036

Liu, Q., Wang, C., Zhang, P., & Zheng, K. (2021). Detecting stock market manipulation via machine learning: evidence from China Securities Regulatory Commission punishment cases. International Review of Financial Analysis, 78, 101887. https://doi.org/10.1016/j.irfa.2021.101887

Lu, S., Liu, C. and Chen, Z. (2021) ‘Predicting stock market crisis via market indicators and mixed frequency investor sentiments’, Expert Systems with Applications. Elsevier, 186, p. 115844. https://doi.org/10.1016/j.eswa.2021.115844

Lutfiani, N., Wijono, S., Rahardja, U., Iriani, A., Aini, Q., & Septian, R. A. D. (2023). A Bibliometric Study: Recommendation based on Artificial Intelligence for iLearning Education. Aptisi Transactions on Technopreneurship (ATT), 5(2), 112-119. https://doi.org/10.34306/att.v5i2.279

Mahardika, R., & Irawan, F. (2022). The Impact Of Thin Capitalization Rules On Tax Avoidance In Indonesia. JURNAL PAJAK INDONESIA (Indonesian Tax Review), 6(2S), 651-662. https://doi.org/10.31092/jpi.v6i2S.1972

Marlina, E., Putri, A. A. and Suriyanti, L. H. (2023) ‘Determinants of strategic management accounting implementation in Higher Education Institutions (HEIs) in Indonesia’, Journal of Accounting and Investment, 24(2), pp. 306–322. https://doi.org/10.18196/jai.v24i2.16562

Mishraz, N., Ashok, S., & Tandon, D. (2021). Predicting Financial Distress in the Indian Banking Sector: A Comparative Study Between the Logistic Regression, LDA and ANN Models. Global Business Review, 09721509211026785.. https://doi.org/10.1177/09721509211026785

Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Vlachogiannakis, N. E. (2020). Predicting bank insolvencies using machine learning techniques. International Journal of Forecasting, 36(3), 1092-1113. https://doi.org/10.1016/j.ijforecast.2019.11.005

Pramono, . E. S. ., Rudianto, D. ., Siboro, F. ., Abdul Baqi , M. P. ., & Julianingsih, D. (2022). Analysis Investor Index Indonesia with Capital Asset Pricing Model (CAPM). Aptisi Transactions on Technopreneurship (ATT), 4(1), 35–46. https://doi.org/10.34306/att.v4i1.218

Pratama, A., & Wijaya, A. (2023). Implementasi Sistem Good Corporate Governance Pada Perangkat Lunak Berbasis Website PT. Pusaka Bumi Transportasi. Technomedia Journal, 7(3), 340-353. https://doi.org/10.33050/tmj.v7i3.1917

Putri, H. R. and Dhini, A. (2019) ‘Prediction of financial distress: Analyzing the industry performance in stock exchange market using data mining’, in 2019 16th International Conference on Service Systems and Service Management (ICSSSM). IEEE, pp. 1–5. https://doi.org/10.1109/ICSSSM.2019.8887824

Putri, R. L., Hidayat, S., Wahyono, E., & Rahmawati, L. (2023). Big Data and Strengthening MSMEs After the Covid-19 Pandemic (Development Studies on Batik MSMEs in East Java). IAIC Transactions on Sustainable Digital Innovation (ITSDI), 4(2), 83-100. https://doi.org/10.34306/itsdi.v4i2.574

Qiu, Y., Zhou, J., Khandelwal, M., Yang, H., Yang, P., & Li, C. (2021). Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Engineering with Computers, 1-18. https://doi.org/10.1007/s00366-021-01393-9

Rahardja, U. et al. (2023) ‘Implementation of Tensor Flow in Air Quality Monitoring Based on Artificial Intelligence’, International Journal of Artificial Intelligence Research, 6(1).

Santoso, R. E., Prawiyogi, A. G., Rahardja, U., Oganda, F. P., & Khofifah, N. (2022). Penggunaan dan Manfaat Big Data dalam Konten Digital. ADI Bisnis Digital Interdisiplin Jurnal, 3(2), 88-91. https://doi.org/10.34306/abdi.v3i2.836

Shrivastava, S., Jeyanthi, P. M., & Singh, S. (2020). Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting. Cogent Economics & Finance, 8(1), 1729569.. https://doi.org/10.1080/23322039.2020.1729569

Sipahutar, R. J. et al. (2020) ‘Drivers and Barriers to IT Service Management Adoption in Indonesian Start-up Based on the Diffusion of Innovation Theory’, in 2020 Fifth International Conference on Informatics and Computing (ICIC). IEEE, pp. 1–8. 10.1109/ICIC50835.2020.9288556

Sir, Y. A. and Soepranoto, A. H. H. (2022) ‘Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas’, J-ICON: Jurnal Komputer dan Informatika, 10(1), pp. 31–38. https://doi.org/10.35508/jicon.v10i1.6554

Soesilo, T. H., & Tinggi, M. M. P. (2021). Analisis pengembangan sistem informasi gaji pegawai (sigap) menggunakan soft system methodology (Studi pada Biro Keuangan Universitas Brawijaya). Universitas Brawijaya.

Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert systems with applications, 134, 93-101. https://doi.org/10.1016/j.eswa.2019.05.028

Srinivas, P., & Katarya, R. (2022). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. https://doi.org/10.1016/j.bspc.2021.103456

Sun, X., Liu, M., & Sima, Z. (2020). A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Research Letters, 32, 101084. https://doi.org/10.1016/j.frl.2018.12.032

Thakkar, A., & Chaudhari, K. (2021). Fusion in stock market prediction: a decade survey on the necessity, recent developments, and potential future directions. Information Fusion, 65, 95-107. https://doi.org/10.1016/j.inffus.2020.08.019

Tölö, E. (2020). Predicting systemic financial crises with recurrent neural networks. Journal of Financial Stability, 49, 100746. https://doi.org/10.1016/j.jfs.2020.100746

Tussa'diah, H., & Kartika, N. Y. (2023). Critical Discourse Analysis on Linguistic Ideology of The Netizens Comments. ADI Journal on Recent Innovation, 4(2), 110-121. https://doi.org/10.34306/ajri.v4i2.838

Vien, B. S., Wong, L., Kuen, T., Rose, L. F., & Chiu, W. K. (2021). A Machine Learning Approach for Anaerobic Reactor Performance Prediction Using Long Short-Term Memory Recurrent Neural Network. Struct. Health Monit. 8apwshm, 18, 61.

Wang, D. N., Li, L., & Zhao, D. (2022). Corporate finance risk prediction based on LightGBM. Information Sciences, 602, 259-268. https://doi.org/10.1016/j.ins.2022.04.058

Wang, H., & Liu, X. (2021). Undersampling bankruptcy prediction: Taiwan bankruptcy data. Plos one, 16(7), e0254030. https://doi.org/10.1371/journal.pone.0254030

Widiastuti, T., Karsa, K., & Juliane, C. (2023). Evaluasi Tingkat Kepuasan Mahasiswa Terhadap Pelayanan Akademik Menggunakan Metode Klasifikasi Algoritma C4. 5. Technomedia Journal, 7(3), 364-380. https://doi.org/10.33050/tmj.v7i3.1932

Zanubiya, J., Meria, L., & Juliansah, M. A. D. (2023). Increasing Consumers with Satisfaction Application based Digital Marketing Strategies. Startupreneur Bisnis Digital (SABDA Journal), 2(1), 12-21.

Zhang, Z. and Chen, Y. (2022) ‘Tail risk early warning system for capital markets based on machine learning algorithms’, Computational Economics. Springer, 60(3), pp. 901–923. https://doi.org/10.1007/s10614-021-10171-0




DOI: https://doi.org/10.18196/jai.v24i2.17970

Refbacks

  • There are currently no refbacks.




Office:
Ruang Jurnal Fakultas Ekonomi dan Bisnis UMY
Gedung Ki Bagus Hadikusuma (E4) Lantai 2, Kampus Terpadu Universitas Muhammadiyah Yogyakarta,
Jalan Brawijaya (Lingkar Selatan), Tamantirto, Kasihan, Bantul, Daerah Istimewa Yogyakarta, Indonesia, 55183
Website: journal.umy.ac.id/index.php/ai - E-mail: jai@umy.ac.id

Journal of Accounting and Investment is licensed under Creative Commons Attribution Attribution-NonCommercial-NoDerivatives 4.0 International License

View My Stats