International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 17 Issue 1 January-March 2026 Submit your research before last 3 days of March to publish your research paper in the issue of January-March.

Assessing the Effectiveness of Machine Learning Classifiers in Handling Imbalanced Datasets

Author(s) Mr. Japheth Kodua Wiredu, Stephen Akobre, Fuseini Jibreel, Abdul-Rahaman Abubakari
Country Ghana
Abstract This paper compared the performance of supervised machine learning classifiers on an imbalanced dataset using their predictive performance, computation efficiency, and robustness to defects in data. The datasets used in experiments were of different sizes, different ratio between the classes and their quality. Classifiers that were tested are Logistic Regression, Naive Bayes, Support Vector Machines (SVM), Decision Trees, Random Forests, and Gradient Boosting. The results of the experiments demonstrate that Gradient Boosting provides best predictive performance with the mean accuracy of 94.2 and the F1-score of 0.92, but has a high computational cost which is approximately 210 seconds of training on a medium-size dataset. Random Forests are highly robust, retaining more than 88 per cent accuracy with 15-per cent injected noise as well as missing values, which makes them pertinent to imperfect and noisy imbalanced data. Logistic Regression and linear SVMs have the highest computational efficiency and can train in less than 3 seconds and 5-10 times faster than ensemble algorithms with an accuracy between 85 and 87. The findings show that there is no universally best classifier that can be used in imbalanced learning problems. Rather, it is preferable to base classifier choice on the needs of the application, including accuracy sensitivity, data imbalance and noise resistance, and computational factors. This work offers a replicable benchmarking model and effective recommendations on choosing the classifier when data are imbalanced.
Keywords Imbalanced Datasets, Machine Learning Classifiers, Supervised Learning, Classification Performance, Robustness Analysis, Computational Efficiency, Ensemble Methods
Field Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In Volume 17, Issue 1, January-March 2026
Published On 2026-02-11
DOI https://doi.org/10.71097/IJSAT.v17.i1.10291
Short DOI https://doi.org/hbn7zj

Share this