Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices
Keywords:
Imbalance dataset, machine learning, resampling, data augmentationAbstract
Determining the performance of a machine learning model usually involves the model's ability to predict accurately, which is evaluated using an accuracy measure. However, other characteristics, such as data quality and balance, must be examined. Models can be biased toward specific predictions that produce a high percentage of accurate predictions but have poor overall performance. In the dataset, there are balanced and imbalanced data situations. An imbalanced data set is a data set that contains a minority class with a limited sample compared to the majority class. This makes the model more likely to favour the majority class, leading to biased predictions and poor performance for the minority class. Therefore, it is essential to address class imbalances to allow the model to make more accurate predictions. Several methods can be used to deal with this problem in the literature, including the resampling process. This method involves either oversampling the minority class, undersampling the majority class, or combining the two techniques. Therefore, this paper lists the existing methods to overcome the dataset imbalance problem in machine learning.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Applied Science, Technology and Computing

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


