Wrapper Feature Selection Approach Based on Binary Firefly Algorithm for Spam E-mail Filtering
The current challenges experienced in spam email detection systems is directly associated with the low accuracy of spam email classification and high dimensionality in feature selection processes. However, Feature selection (FS) as a global optimization approach in machine learning (ML) decreases data redundancy and creates a set of accurate and acceptable results. In this paper, a Firefly algorithm-based FS algorithm is proposed for decreasing the dimensionality of features and enhance the accuracy of classifying spam emails. The features are represented in a binary form for each firefly; in other words, the features are converted to binary using a sigmoid function. The proposed Binary Firefly Algorithm (BFA) explores the feature space for the best feature subsets, and the selection of a feature is based on a fitness function which is dependent on the achieved accuracy using Naïve Bayesian Classifier (NBC). The performance of the classifier and the dimension of the selected feature vector as a classifier input are considered when evaluating the performance of the BFA using SpamBase dataset. The outcome of the experiments showed the BFA to achieve good FS results even with a small set of selected features. This suggests the possibility of achieving a good spam email classification accuracy when using the NBC-based BFA.