Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data

  • Haseeb Ali Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu pahat, johor
  • Nurul Ashikin Samat Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu pahat, johor
  • Hafiz Maaz Ashghar
Keywords: Imbalance data, Majority class, Data mining, Classification

Abstract

Today, the surge in data has also increased the complexity of class imbalance problem. Real-world scenarios in industry, medical, and banks usually generate imbalanced data which poorly affect prediction of the minority class by machine learning algorithms – resulting in high cost and also life risk factors. Most importantly, in bio-medical field, medical diagnosis of the cancerous patients suffers from uneven samples with respect to classes. In this study an oversampling method is proposed to eliminate between the class and within-class imbalances problem. The method eliminates noise from the datasets and takes significant concepts of the minority class for oversampling. We used sparsity factor in this method, which significantly improved the learnability of the classification model. This is achieved by generating appropriate and more number of synthetic samples in minority class. For the performance evaluation of this method, series of experiments are performed on the four imbalanced microarray datasets. Their classification results declared the proposed method better than the baseline existing techniques.

Published
04-03-2020
How to Cite
Ali, H., Samat, N. A., & Ashghar, H. M. (2020). Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data. Journal of Soft Computing and Data Mining, 1(1), 17-26. Retrieved from https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/6827
Section
Articles