E-mail Spam Filtering using Genetic Algorithm based on Probabilistic Weights and Words Count

  • Pronaya Bhattacharya
  • Arunendra Singh Pranveer Singh Institute of Technology, Kanpur 209305,U.P., India
Keywords: Bayes, Naive Bayes, Spam, Ham, Genetic Algorithm

Abstract

Spam email filtering is a hot area of research, as they are growing with time. Most of the spam mails are promotional in nature. Therefore, spam mails are not harmful for the computers, but these mails are annoying for user. Spam mails can be filtered using spam filtering methods like Bayes and Naive Bayes classifications. Classification is done on the basis of content of the mail, or in particular on words and probability is calculated of finding a word from spam and ham classifier words. There are few words which can be found in both spam and ham mails, thus threshold based mechanism is desirable for correct classification. For correct classification using Bayes and Naive Bayes dataset should be huge ideally number of mails should be infinite. But in real applications a scheme is desired which is adaptive in nature and can provide good results with a few mails. In the similar direction, in this paper a genetic algorithm based spam detection method is detailed which is very simple and provide good results with limited dataset.

Downloads

Download data is not yet available.

Author Biography

Arunendra Singh, Pranveer Singh Institute of Technology, Kanpur 209305,U.P., India
Department of Information Technology
Published
31-01-2020
How to Cite
Bhattacharya, P., & Singh, A. (2020). E-mail Spam Filtering using Genetic Algorithm based on Probabilistic Weights and Words Count. International Journal of Integrated Engineering, 12(1), 40-49. Retrieved from https://publisher.uthm.edu.my/ojs/index.php/ijie/article/view/3397
Section
Articles