Ensemble learning with imbalanced data handling in the early detection of capital markets
Abstract
Research aims: This study aims to create an early detection model to predict events in the Indonesian capital market.
Design/Methodology/Approach: A quantitative study comparing ensemble learning models with imbalanced data handling detected early capital market events. This study used five ensemble learning models—Random Forest, ExtraTrees, CatBoost, XGBoost, and LightGBM—to detect early events in the Indonesian capital market by handling imbalanced data, such as under sampling (RUS), oversampling (SMOTE, SMOTE-Broder, ADASYN), and over-under sampling (SMOTE-Tomek, SMOTE-ENN), weighted (class weight). Global and regional stock markets, commodities, exchange rates, technical indicators, sectoral indices, JCI leaders, MSCI, net buys of foreign stocks, national securities, and national share ownership all predicted the lowest return of Crisis Management Protocol (CMP) binary responses.
Research findings: Hyperparameters and thresholds were tuned to produce the optimum model. The best model had the highest G-mean. ExtraTrees with SMOTE-ENN predicted the highest number of one-day events, with a G-Mean of 96.88%. LightGBM with SMOTE handling best predicted five-day events with an 89.21% G-Mean. With a G-Mean of 89.49%, CatBoost with SMOTE-Border handling was the best for a 15-day event. In addition, LightGBM with SMOTE-Tomek handling and 68.02% G-Mean was best for 30-day events. Further, performance evaluation scores decreased with increased prediction time.
Theoretical contribution/Originality: This work relates more imbalance handling methods and ensemble learning to capital market early detection cases.
Practitioner/Policy implication: Capital markets can indicate economic stability. Maintaining capital market efficacy and economic value requires a system to detect pressure.
Research limitation/Implication: This study used ensemble learning models to predict capital market events 1, 5, 15, and 30 days ahead, assuming Indonesian working days. The model's forecast results are expected to be utilized to monitor the capital market and take precautions.
Keywords
Full Text:
PDFReferences
Aini, Q., Manongga, D., Rahardja, U., Sembiring, I., & Efendy, R. (2023). Innovation and Key Benefits of Business Models in Blockchain Companies. Blockchain Frontier Technology, 2(2), 24-35. https://doi.org/10.34306/bfront.v2i2.161
Alfian, G., Syafrudin, M., Fahrurrozi, I., Fitriyani, N. L., Atmaji, F. T. D., Widodo, T., ... & Rhee, J. (2022). Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers, 11(9), 136. https://doi.org/10.3390/computers11090136
Aly, S., Alfonse, M., & Salem, A. B. M. (2022). Intelligent Model for Enhancing the Bankruptcy Prediction with Imbalanced Data Using Oversampling and CatBoost. International Journal of Intelligent Computing and Information Sciences, 22(3), 92-108. https://doi.org/10.21608/ijicis.2022.105654.1138
Asundi, R. V., Prakash, R., & Kumar, K. (n.d.). Class Weight technique for Handling Class Imbalance.
Bintoro, B. P. K., Lutfiani, N. and Julianingsih, D. (2023) ‘Analysis of the Effect of Service Quality on Company Reputation on Purchase Decisions for Professional Recruitment Services’, APTISI Transactions on Management (ATM), 7(1), pp. 35–41. https://doi.org/10.33050/atm.v7i1.1736
Bluwstein, K., Buckmann, M., Joseph, A., Kapadia, S., & Simsek, Ö. (2021). Credit growth, the yield curve and financial crisis prediction: Evidence from a machine learning approach. ECB Working Paper No. 2021/2614. http://dx.doi.org/10.2139/ssrn.3969562
Candra, O., Chammam, A., Rahardja, U., Ramirez-Coronel, A. A., Al-Jaleel, A. A., Al-Kharsan, I. H., ... & Rezai, M. M. (2023). Optimal Participation of the Renewable Energy in Microgrids with Load Management Strategy. Environmental and Climate Technologies, 27(1), 56-66. https://doi.org/10.2478/rtuect-2023-0005
Carmona, P., Climent, F., & Momparler, A. (2019). Predicting failure in the US banking sector: An extreme gradient boosting approach. International Review of Economics & Finance, 61, 304-323. https://doi.org/10.1016/j.iref.2018.03.008
Chen, R. C., Dewi, C., Huang, S. W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 52. https://doi.org/10.1186/s40537-020-00327-4
Coffinet, J., & Kien, J. N. (2019). Detection of rare events: A machine learning toolkit with an application to banking crises. The Journal of Finance and Data Science, 5(4), 183-207. https://doi.org/10.1016/j.jfds.2020.04.001
Faris, H., Abukhurma, R., Almanaseer, W., Saadeh, M., Mora, A. M., Castillo, P. A., & Aljarah, I. (2020). Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market. Progress in Artificial Intelligence, 9, 31-53. https://doi.org/10.1007/s13748-019-00197-9
Gnip, P., & Drotár, P. (2019, September). Ensemble methods for strongly imbalanced data: bankruptcy prediction. In 2019 IEEE 17th International Symposium on Intelligent Systems and Informatics (SISY), 155-160. IEEE.
Google Developers. (2021). Machine Learning. https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
Hariguna, T., Rahardja, U., & Sarmini. (2022). The Role of E-Government Ambidexterity as the Impact of Current Technology and Public Value: An Empirical Study. Informatics, 9(3), 67. https://doi.org/10.3390/informatics9030067
Hermawan, A., Sunaryo, W., & Hardhienata, S. (2023). Optimal Solution for OCB Improvement Through Strengthening of Servant Leadership, Creativity, and Empowerment. Aptisi Transactions on Technopreneurship (ATT), 5(1Sp), 11-21. https://doi.org/10.34306/att.v5i1Sp.307
Indrawati, A. (2021) ‘Penerapan Teknik Kombinasi Oversampling Dan Undersampling Untuk Mengatasi Permasalahan Imbalanced Dataset’, JIKO (Jurnal Informatika dan Komputer), 4(1), pp. 38–43. https://doi.org/10.33387/jiko.v4i1.2561
Islam, S. R., Eberle, W., Ghafoor, S. K., Bundy, S. C., Talbert, D. A., & Siraj, A. (2019). Investigating bankruptcy prediction models in the presence of extreme class imbalance and multiple stages of economy. arXiv preprint arXiv:1911.09858.
Jabeur, S. B., Gharib, C., Mefteh-Wali, S., & Arfi, W. B. (2021). CatBoost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166, 120658. https://doi.org/10.1016/j.techfore.2021.120658
Junyu, H. (2020, August). Prediction of Financial Crisis Based on Machine Learning. In 2020 The 4th International Conference on Business and Information Management , 71-75. https://doi.org/10.1145/3418653.3418674
Kosasi, S., Yuliani, I. D. A. E., & Rahardja, U. (2022, February). Boosting e-service quality of online product businesses through it leadership. In 2022 International Conference on Science and Technology (ICOSTECH), 1-10. IEEE. 10.1109/ICOSTECH54296.2022.9829036
Liu, Q., Wang, C., Zhang, P., & Zheng, K. (2021). Detecting stock market manipulation via machine learning: evidence from China Securities Regulatory Commission punishment cases. International Review of Financial Analysis, 78, 101887. https://doi.org/10.1016/j.irfa.2021.101887
Lu, S., Liu, C. and Chen, Z. (2021) ‘Predicting stock market crisis via market indicators and mixed frequency investor sentiments’, Expert Systems with Applications. Elsevier, 186, p. 115844. https://doi.org/10.1016/j.eswa.2021.115844
Lutfiani, N., Wijono, S., Rahardja, U., Iriani, A., Aini, Q., & Septian, R. A. D. (2023). A Bibliometric Study: Recommendation based on Artificial Intelligence for iLearning Education. Aptisi Transactions on Technopreneurship (ATT), 5(2), 112-119. https://doi.org/10.34306/att.v5i2.279
Mahardika, R., & Irawan, F. (2022). The Impact Of Thin Capitalization Rules On Tax Avoidance In Indonesia. JURNAL PAJAK INDONESIA (Indonesian Tax Review), 6(2S), 651-662. https://doi.org/10.31092/jpi.v6i2S.1972
Marlina, E., Putri, A. A. and Suriyanti, L. H. (2023) ‘Determinants of strategic management accounting implementation in Higher Education Institutions (HEIs) in Indonesia’, Journal of Accounting and Investment, 24(2), pp. 306–322. https://doi.org/10.18196/jai.v24i2.16562
Mishraz, N., Ashok, S., & Tandon, D. (2021). Predicting Financial Distress in the Indian Banking Sector: A Comparative Study Between the Logistic Regression, LDA and ANN Models. Global Business Review, 09721509211026785.. https://doi.org/10.1177/09721509211026785
Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Vlachogiannakis, N. E. (2020). Predicting bank insolvencies using machine learning techniques. International Journal of Forecasting, 36(3), 1092-1113. https://doi.org/10.1016/j.ijforecast.2019.11.005
Pramono, . E. S. ., Rudianto, D. ., Siboro, F. ., Abdul Baqi , M. P. ., & Julianingsih, D. (2022). Analysis Investor Index Indonesia with Capital Asset Pricing Model (CAPM). Aptisi Transactions on Technopreneurship (ATT), 4(1), 35–46. https://doi.org/10.34306/att.v4i1.218
Pratama, A., & Wijaya, A. (2023). Implementasi Sistem Good Corporate Governance Pada Perangkat Lunak Berbasis Website PT. Pusaka Bumi Transportasi. Technomedia Journal, 7(3), 340-353. https://doi.org/10.33050/tmj.v7i3.1917
Putri, H. R. and Dhini, A. (2019) ‘Prediction of financial distress: Analyzing the industry performance in stock exchange market using data mining’, in 2019 16th International Conference on Service Systems and Service Management (ICSSSM). IEEE, pp. 1–5. https://doi.org/10.1109/ICSSSM.2019.8887824
Putri, R. L., Hidayat, S., Wahyono, E., & Rahmawati, L. (2023). Big Data and Strengthening MSMEs After the Covid-19 Pandemic (Development Studies on Batik MSMEs in East Java). IAIC Transactions on Sustainable Digital Innovation (ITSDI), 4(2), 83-100. https://doi.org/10.34306/itsdi.v4i2.574
Qiu, Y., Zhou, J., Khandelwal, M., Yang, H., Yang, P., & Li, C. (2021). Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Engineering with Computers, 1-18. https://doi.org/10.1007/s00366-021-01393-9
Rahardja, U. et al. (2023) ‘Implementation of Tensor Flow in Air Quality Monitoring Based on Artificial Intelligence’, International Journal of Artificial Intelligence Research, 6(1).
Santoso, R. E., Prawiyogi, A. G., Rahardja, U., Oganda, F. P., & Khofifah, N. (2022). Penggunaan dan Manfaat Big Data dalam Konten Digital. ADI Bisnis Digital Interdisiplin Jurnal, 3(2), 88-91. https://doi.org/10.34306/abdi.v3i2.836
Shrivastava, S., Jeyanthi, P. M., & Singh, S. (2020). Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting. Cogent Economics & Finance, 8(1), 1729569.. https://doi.org/10.1080/23322039.2020.1729569
Sipahutar, R. J. et al. (2020) ‘Drivers and Barriers to IT Service Management Adoption in Indonesian Start-up Based on the Diffusion of Innovation Theory’, in 2020 Fifth International Conference on Informatics and Computing (ICIC). IEEE, pp. 1–8. 10.1109/ICIC50835.2020.9288556
Sir, Y. A. and Soepranoto, A. H. H. (2022) ‘Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas’, J-ICON: Jurnal Komputer dan Informatika, 10(1), pp. 31–38. https://doi.org/10.35508/jicon.v10i1.6554
Soesilo, T. H., & Tinggi, M. M. P. (2021). Analisis pengembangan sistem informasi gaji pegawai (sigap) menggunakan soft system methodology (Studi pada Biro Keuangan Universitas Brawijaya). Universitas Brawijaya.
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert systems with applications, 134, 93-101. https://doi.org/10.1016/j.eswa.2019.05.028
Srinivas, P., & Katarya, R. (2022). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. https://doi.org/10.1016/j.bspc.2021.103456
Sun, X., Liu, M., & Sima, Z. (2020). A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Research Letters, 32, 101084. https://doi.org/10.1016/j.frl.2018.12.032
Thakkar, A., & Chaudhari, K. (2021). Fusion in stock market prediction: a decade survey on the necessity, recent developments, and potential future directions. Information Fusion, 65, 95-107. https://doi.org/10.1016/j.inffus.2020.08.019
Tölö, E. (2020). Predicting systemic financial crises with recurrent neural networks. Journal of Financial Stability, 49, 100746. https://doi.org/10.1016/j.jfs.2020.100746
Tussa'diah, H., & Kartika, N. Y. (2023). Critical Discourse Analysis on Linguistic Ideology of The Netizens Comments. ADI Journal on Recent Innovation, 4(2), 110-121. https://doi.org/10.34306/ajri.v4i2.838
Vien, B. S., Wong, L., Kuen, T., Rose, L. F., & Chiu, W. K. (2021). A Machine Learning Approach for Anaerobic Reactor Performance Prediction Using Long Short-Term Memory Recurrent Neural Network. Struct. Health Monit. 8apwshm, 18, 61.
Wang, D. N., Li, L., & Zhao, D. (2022). Corporate finance risk prediction based on LightGBM. Information Sciences, 602, 259-268. https://doi.org/10.1016/j.ins.2022.04.058
Wang, H., & Liu, X. (2021). Undersampling bankruptcy prediction: Taiwan bankruptcy data. Plos one, 16(7), e0254030. https://doi.org/10.1371/journal.pone.0254030
Widiastuti, T., Karsa, K., & Juliane, C. (2023). Evaluasi Tingkat Kepuasan Mahasiswa Terhadap Pelayanan Akademik Menggunakan Metode Klasifikasi Algoritma C4. 5. Technomedia Journal, 7(3), 364-380. https://doi.org/10.33050/tmj.v7i3.1932
Zanubiya, J., Meria, L., & Juliansah, M. A. D. (2023). Increasing Consumers with Satisfaction Application based Digital Marketing Strategies. Startupreneur Bisnis Digital (SABDA Journal), 2(1), 12-21.
Zhang, Z. and Chen, Y. (2022) ‘Tail risk early warning system for capital markets based on machine learning algorithms’, Computational Economics. Springer, 60(3), pp. 901–923. https://doi.org/10.1007/s10614-021-10171-0
DOI: https://doi.org/10.18196/jai.v24i2.17970
Refbacks
- There are currently no refbacks.
Office:
Ruang Jurnal Fakultas Ekonomi dan Bisnis UMY
Gedung Ki Bagus Hadikusuma (E4) Lantai 2, Kampus Terpadu Universitas Muhammadiyah Yogyakarta,
Jalan Brawijaya (Lingkar Selatan), Tamantirto, Kasihan, Bantul, Daerah Istimewa Yogyakarta, Indonesia, 55183
Website: journal.umy.ac.id/index.php/ai - E-mail: jai@umy.ac.id
Journal of Accounting and Investment is licensed under Creative Commons Attribution Attribution-NonCommercial-NoDerivatives 4.0 International License
View My Stats