Detection of topic on Health News in Twitter Data

Shum  Chen Yau; Juhaida  Abu Bakar; Azian Azamimi  Abdullah; Hazlyna  Harun; Ruziana Mohamad Rasli; Lim Zheng  Yang; Evon Thum Yi Mun

Authors

Shum Chen Yau Data Science Research Lab, School of Computing, UUM College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, MALAYSIA
Juhaida Abu Bakar Medical Devices and Life Sciences Cluster, Sport Engineering Research Centre, Centre of Excellence (SERC), Universiti Malaysia Perlis (UniMAP), 02600 Arau, Perlis, MALAYSIA
Azian Azamimi Abdullah Medical Devices and Life Sciences Cluster, Sport Engineering Research Centre, Centre of Excellence (SERC), Universiti Malaysia Perlis (UniMAP), 02600 Arau, Perlis, MALAYSIA
Hazlyna Harun Data Science Research Lab, School of Computing, UUM College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, MALAYSIA
Ruziana Mohamad Rasli Department of Information Technology and Communication, Tuanku Syed Sirajuddin Polytechnic, Pauh Putra, 02600 Arau, Perlis, MALAYSIA
Lim Zheng Yang Data Science Research Lab, School of Computing, UUM College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, MALAYSIA
Evon Thum Yi Mun Data Science Research Lab, School of Computing, UUM College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, MALAYSIA

Keywords:

Prolonged sitting, muscle activity, exercises on prolonged sitting

Abstract

Abstract: The development and rapid popularization of the internet has led to an exponential growth of data in the network, thus, the text mining becomes more important. Users search for the information from the immense information available online. The ways to obtain valuable information, and to classify, organize and manage vast text data automatically make the text processing even more difficult. Therefore, in order to solve those problems and requirements, intelligent information processing has been extensively studied. Topic modelling has been widely employed in the field of natural language processing. Current research directions are more focused on ways to improve the classification speed and accuracy of text classification and topic detection as well as selecting feature methods in achieving better dimension reduction operations. Latent Dirichlet Allocation (LDA) topic model works well on data noise reduction. The LDA is widely used as a feature model combined with the classifier design in order to achieve a good classification effect. This study aims to conduct data mining and save load from the huge database. Thus, three supervised learning algorithms are run, which are NaÃ¯ve Bayes, Decision Tree and Random Forest. Random Forest classifier outperforms the other two classifiers with 99.99% accuracy. Seven clusters for topic modelling have been revealed using Random Forest classifier. Each output has been set to four highest word and shows the highest term and its weight. The highest term used in the dataset is term â€˜Ebolaâ€™. Based on the finding of this study, it shows that the combination of the LDA and supervised learning algorithm effectively solve the problem of data sparseness in short text sets. The method of selecting microblogs that are most likely to discuss news topics will significantly reduce the size of data objects of concern, and to a certain extent eliminate the interference of non-news blogs.

Downloads

Download data is not yet available.

Detection of topic on Health News in Twitter Data

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

How to Cite

Make a Submission

guidelines

index

journalsofuthm