Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices

Authors

  • Rusma Anieza Ruslan
  • Nureize Arbaiy

Keywords:

Imbalance dataset, machine learning, resampling, data augmentation

Abstract

Determining the performance of a machine learning model usually involves the model's ability to predict accurately, which is evaluated using an accuracy measure. However, other characteristics, such as data quality and balance, must be examined. Models can be biased toward specific predictions that produce a high percentage of accurate predictions but have poor overall performance. In the dataset, there are balanced and imbalanced data situations. An imbalanced data set is a data set that contains a minority class with a limited sample compared to the majority class. This makes the model more likely to favour the majority class, leading to biased predictions and poor performance for the minority class. Therefore, it is essential to address class imbalances to allow the model to make more accurate predictions. Several methods can be used to deal with this problem in the literature, including the resampling process. This method involves either oversampling the minority class, undersampling the majority class, or combining the two techniques. Therefore, this paper lists the existing methods to overcome the dataset imbalance problem in machine learning.

Downloads

Download data is not yet available.

Downloads

Published

12-11-2024

Issue

Section

Articles

How to Cite

Ruslan, R. A., & Arbaiy, N. (2024). Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices. Journal of Applied Science, Technology and Computing, 1(2), 20-27. https://publisher.uthm.edu.my/ojs/index.php/jastec/article/view/18626