E-mail Spam Filtering using Genetic Algorithm based on Probabilistic Weights and Words Count

Authors

  • Pronaya Bhattacharya
  • Arunendra Singh Pranveer Singh Institute of Technology, Kanpur 209305,U.P., India

Keywords:

Bayes, Naive Bayes, Spam, Ham, Genetic Algorithm

Abstract

Spam email filtering is a hot area of research, as they are growing with time. Most of the spam mails are promotional in nature. Therefore, spam mails are not harmful for the computers, but these mails are annoying for user. Spam mails can be filtered using spam filtering methods like Bayes and Naive Bayes classifications. Classification is done on the basis of content of the mail, or in particular on words and probability is calculated of finding a word from spam and ham classifier words. There are few words which can be found in both spam and ham mails, thus threshold based mechanism is desirable for correct classification. For correct classification using Bayes and Naive Bayes dataset should be huge ideally number of mails should be infinite. But in real applications a scheme is desired which is adaptive in nature and can provide good results with a few mails. In the similar direction, in this paper a genetic algorithm based spam detection method is detailed which is very simple and provide good results with limited dataset.

Downloads

Download data is not yet available.

Author Biography

  • Arunendra Singh, Pranveer Singh Institute of Technology, Kanpur 209305,U.P., India
    Department of Information Technology

Downloads

Published

31-01-2020

Issue

Section

Articles

How to Cite

Bhattacharya, P., & Singh, A. (2020). E-mail Spam Filtering using Genetic Algorithm based on Probabilistic Weights and Words Count. International Journal of Integrated Engineering, 12(1), 40-49. https://publisher.uthm.edu.my/ojs/index.php/ijie/article/view/3397