Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data
Keywords:
Imbalance data, Majority class, Data mining, ClassificationAbstract
Today, the surge in data has also increased the complexity of class imbalance problem. Real-world scenarios in industry, medical, and banks usually generate imbalanced data which poorly affect prediction of the minority class by machine learning algorithms – resulting in high cost and also life risk factors. Most importantly, in bio-medical field, medical diagnosis of the cancerous patients suffers from uneven samples with respect to classes. In this study an oversampling method is proposed to eliminate between the class and within-class imbalances problem. The method eliminates noise from the datasets and takes significant concepts of the minority class for oversampling. We used sparsity factor in this method, which significantly improved the learnability of the classification model. This is achieved by generating appropriate and more number of synthetic samples in minority class. For the performance evaluation of this method, series of experiments are performed on the four imbalanced microarray datasets. Their classification results declared the proposed method better than the baseline existing techniques.