Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data

Authors

  • Haseeb Ali Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu pahat, johor
  • Nurul Ashikin Samat Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu pahat, johor
  • Hafiz Maaz Ashghar

Keywords:

Imbalance data, Majority class, Data mining, Classification

Abstract

Today, the surge in data has also increased the complexity of class imbalance problem. Real-world scenarios in industry, medical, and banks usually generate imbalanced data which poorly affect prediction of the minority class by machine learning algorithms – resulting in high cost and also life risk factors. Most importantly, in bio-medical field, medical diagnosis of the cancerous patients suffers from uneven samples with respect to classes. In this study an oversampling method is proposed to eliminate between the class and within-class imbalances problem. The method eliminates noise from the datasets and takes significant concepts of the minority class for oversampling. We used sparsity factor in this method, which significantly improved the learnability of the classification model. This is achieved by generating appropriate and more number of synthetic samples in minority class. For the performance evaluation of this method, series of experiments are performed on the four imbalanced microarray datasets. Their classification results declared the proposed method better than the baseline existing techniques.

Downloads

Published

04-03-2020

Issue

Section

Articles

How to Cite

Ali, H., Samat, N. A., & Ashghar, H. M. (2020). Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data. Journal of Soft Computing and Data Mining, 1(1), 17-26. https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/6827