KAROMA: Karonese Morphologycal Analyzer Based on Graph Theory
Keywords:
KAROMA, graph theory, member checking, text similarity-basedAbstract
Karonese is a local language of Karo ethnics from north Sumatra, Indonesia. Karonese terms have unique phonology, which exhibits variations in spellings and pronunciations while retaining the same meaning and in time. A morphological analyzer is a very critical issue for the enhancement of Natural language Processing (NLP) research on local languages, as well as in Karonese. This work proposed a morphology analyzer of Karonese based on graph theory (KAROMA). With its unique phonology, the formation of the Karonese morphology analyzer uses a word-based morphology approach. Karonese terms that exhibit variations in spellings and pronunciations while retaining the same meaning and in time are expressed in a completed graph. Thus, the set of completed graphs form the Karonese WordNet. Furthermore, the stemming and lemmatization mechanism for Karonese is checked in the WordNet. This study also provides two KAROMA evaluators; member checking-based and text similarity-based by modified cosine similarity. The KAROMA evaluation process involves synthetic sentences of Karonese to calculate its text similarity. As a result, KAROMA detects the uniqueness of Karonese terms and normalizes them. The performance of KAROMA is 99% based on member checking and 97.16% of text similarity-based. Of course, this success is part of the development of NLP research for Karonese, such as sentiment analysis, text summarization, etc
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Soft Computing and Data Mining
![Creative Commons License](http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.