Auto-Diacritization and Stylistic Realization of Arabic Text using Deep Neural Networks
Keywords:
Arabic Diacritization, Quranic Linguistic Styles, NLP, LSTM, GRU, Transformer-Based Models, Arabic Dialect, Textual Data Processing, QuranAbstract
This paper presents QRDiaRec, an advanced diacritization system for Arabic Quranic texts. In Arabic, linguistic style refers to the variations in diacritic markings used to convey different pronunciations, dialects, meanings, and contextual understandings. QRDiaRec addresses the challenge of interpreting Arabic diacritics across multiple linguistic styles, which is crucial for accurate language processing. Unlike traditional systems that generate only one correct form of diacritics, QRDiaRec can recognize and produce multiple valid diacritic forms. This capability is due to its training on a dataset that encompasses seven Quranic linguistic styles. The Qur'an is an ideal case study because it is one text with multiple linguistic patterns, allowing us to recognize different forms of correct diacritization. QRDiaRec employs bidirectional LSTM, GRU, and transformer-based models to convert non-diacritic texts into annotated formats, achieving up to 94.2% accuracy. The system enhances Arabic language processing, impacting NLP, machine translation, and Arabic linguistics.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Quranic Sciences and Research
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.