Auto-Diacritization and Stylistic Realization of Arabic Text using Deep Neural Networks

Authors

  • Adel Sabour University of Washington, Computer Science and Systems, Tacoma, USA
  • Mohamed Ali University of Rhode Island
  • Abdeltawab Hendawi University of Washington

Keywords:

Arabic Diacritization, Quranic Linguistic Styles, NLP, LSTM, GRU, Transformer-Based Models, Arabic Dialect, Textual Data Processing, Quran

Abstract

This paper presents QRDiaRec, an advanced diacritization system for Arabic Quranic texts. In Arabic, linguistic style refers to the variations in diacritic markings used to convey different pronunciations, dialects, meanings, and contextual understandings. QRDiaRec addresses the challenge of interpreting Arabic diacritics across multiple linguistic styles, which is crucial for accurate language processing.  Unlike traditional systems that generate only one correct form of diacritics, QRDiaRec can recognize and produce multiple valid diacritic forms.  This capability is due to its training on a dataset that encompasses seven Quranic linguistic styles. The Qur'an is an ideal case study because it is one text with multiple linguistic patterns, allowing us to recognize different forms of correct diacritization.  QRDiaRec employs bidirectional LSTM, GRU, and transformer-based models to convert non-diacritic texts into annotated formats, achieving up to 94.2% accuracy.  The system enhances Arabic language processing, impacting NLP, machine translation, and Arabic linguistics.

Downloads

Published

04-07-2024

How to Cite

Sabour, A., Ali, M., & Hendawi, A. (2024). Auto-Diacritization and Stylistic Realization of Arabic Text using Deep Neural Networks. Journal of Quranic Sciences and Research, 5(1), 12-29. https://publisher.uthm.edu.my/ojs/index.php/jqsr/article/view/17740