A Comparative Study of Filter-Based Feature Selection in High-Dimensional Medical Datasets Using Whale Optimization Variants

Authors

  • Li Yu Yab Universiti Tun Hussein Onn Malaysia
  • Noorhaniza Wahid Universiti Tun Hussein Onn Malaysia
  • Rahayu A Hamid Universiti Tun Hussein Onn Malaysia
  • José Machado Universidade do Minho, Campus of Gualtar, Braga 4710-057, PORTUGAL

Keywords:

Filter-based feature selection, Hierarchical Whale Optimization Algorithm, Metaheuristic optimization, High-dimensional data, Microarray gene expression

Abstract

High-dimensional medical datasets, such as microarray gene expression profiles, pose significant challenges for feature selection (FS) due to their large feature space and limited sample size, often resulting in unstable and inconsistent outcomes. Metaheuristic algorithms like the Whale Optimization Algorithm (WOA) have shown promise in FS; however, their reliance on a single leader and linear control parameter typically leads to poor exploration-exploitation balance and reduced stability. This study proposes a Hierarchical Whale Optimization Algorithm (HiWOA) for filter-based FS, incorporating two key enhancements: a hierarchical leadership strategy, where three leaders guide the search, and an arcsine-based control parameter that enables a smoother transition between exploration and exploitation. Unlike earlier HiWOA applications in optimization or wrapper-based selection, this is the first to adapt HiWOA to filter-based FS. Experiments were conducted on five benchmark medical datasets using four filter methods (ANOVA, Chi-square, Mutual Information, and Pearson Correlation), with performance evaluated in terms of algorithmic stability, FS stability, and classification accuracy using kNN. Results demonstrate that HiWOA consistently achieves well-balanced exploration-exploitation ratio with higher exploration rates (+25.66%), improved FS stability (+0.00261 in MKCI), and superior classification accuracy (+11.28%) compared to WOA and its variants. Among the filter methods, ANOVA combined with HiWOA delivered the most discriminative feature subsets, establishing a robust framework for high-dimensional medical data analysis.

Downloads

Download data is not yet available.

Downloads

Published

28-12-2025

Issue

Section

Articles

How to Cite

Yab, L. Y., Wahid, N., A Hamid, R., & José Machado. (2025). A Comparative Study of Filter-Based Feature Selection in High-Dimensional Medical Datasets Using Whale Optimization Variants. Journal of Soft Computing and Data Mining, 6(3), 219-242. https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/23216