A Comparative Study of Filter-Based Feature Selection in High-Dimensional Medical Datasets Using Whale Optimization Variants
Keywords:
Filter-based feature selection, Hierarchical Whale Optimization Algorithm, Metaheuristic optimization, High-dimensional data, Microarray gene expressionAbstract
High-dimensional medical datasets, such as microarray gene expression profiles, pose significant challenges for feature selection (FS) due to their large feature space and limited sample size, often resulting in unstable and inconsistent outcomes. Metaheuristic algorithms like the Whale Optimization Algorithm (WOA) have shown promise in FS; however, their reliance on a single leader and linear control parameter typically leads to poor exploration-exploitation balance and reduced stability. This study proposes a Hierarchical Whale Optimization Algorithm (HiWOA) for filter-based FS, incorporating two key enhancements: a hierarchical leadership strategy, where three leaders guide the search, and an arcsine-based control parameter that enables a smoother transition between exploration and exploitation. Unlike earlier HiWOA applications in optimization or wrapper-based selection, this is the first to adapt HiWOA to filter-based FS. Experiments were conducted on five benchmark medical datasets using four filter methods (ANOVA, Chi-square, Mutual Information, and Pearson Correlation), with performance evaluated in terms of algorithmic stability, FS stability, and classification accuracy using kNN. Results demonstrate that HiWOA consistently achieves well-balanced exploration-exploitation ratio with higher exploration rates (+25.66%), improved FS stability (+0.00261 in MKCI), and superior classification accuracy (+11.28%) compared to WOA and its variants. Among the filter methods, ANOVA combined with HiWOA delivered the most discriminative feature subsets, establishing a robust framework for high-dimensional medical data analysis.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









