Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices

Rusma Anieza Ruslan; Nureize Arbaiy

Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices

Authors

Rusma Anieza Ruslan
Nureize Arbaiy

Keywords:

Imbalance dataset, machine learning, resampling, data augmentation

Abstract

Determining the performance of a machine learning model usually involves the model's ability to predict accurately, which is evaluated using an accuracy measure. However, other characteristics, such as data quality and balance, must be examined. Models can be biased toward specific predictions that produce a high percentage of accurate predictions but have poor overall performance. In the dataset, there are balanced and imbalanced data situations. An imbalanced data set is a data set that contains a minority class with a limited sample compared to the majority class. This makes the model more likely to favour the majority class, leading to biased predictions and poor performance for the minority class. Therefore, it is essential to address class imbalances to allow the model to make more accurate predictions. Several methods can be used to deal with this problem in the literature, including the resampling process. This method involves either oversampling the minority class, undersampling the majority class, or combining the two techniques. Therefore, this paper lists the existing methods to overcome the dataset imbalance problem in machine learning.

Downloads

Download data is not yet available.

Downloads

Published

12-11-2024

Issue

Vol. 1 No. 2 (2024)

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

How to Cite

Ruslan, R. A., & Arbaiy, N. (2024). Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices. Journal of Applied Science, Technology and Computing, 1(2), 20-27. https://publisher.uthm.edu.my/ojs/index.php/jastec/article/view/18626

Download Citation

Handling Imbalanced Datasets in Machine Learning: Challenges, Approaches, and Best Practices

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Info

Journalsofuthm

Index By: