The Use of Transformer-Based Models for Automatic Short-answer Scoring in Education: A Systematic Literature Review

Widhy Hayuhardhika Nugraha Putra; Mohd Farhan Md Fudzee; Buce Trias Hanggara; Welly Purnomo; Fajar Pradana; Admaja Dwi Herlambang

Authors

Widhy Hayuhardhika Nugraha Putra Brawijaya University, Universiti Tun Hussein Onn Malaysia https://orcid.org/0000-0003-1585-1062
Mohd Farhan Md Fudzee Brawijaya University, Universiti Tun Hussein Onn Malaysia https://orcid.org/0000-0002-6801-2660
Buce Trias Hanggara Brawijaya University,PT. Garapan Teknologi Indonesia (GTI – ICT industry) https://orcid.org/0000-0001-9610-8144
Welly Purnomo Brawijaya University
Fajar Pradana Brawijaya University https://orcid.org/0000-0002-7730-3590
Admaja Dwi Herlambang Brawijaya University https://orcid.org/0000-0002-2033-3676

Keywords:

automatic short-answer scoring, Quality Education, Large Language Model, artificial intelligence for education

Abstract

This review focuses on recent advancements in the Automatic Short-Answer Scoring (ASAS) system in education. The primary objective of this review is to identify current trends in utilizing transformer-based models for the ASAS system. This review also aims to discuss future directions for ASAS technology. ASAS’s conventional machine learning methods were inconsistent because they rely on statistical similarity and are prone to bias. Meanwhile, transformer-based models were typically used for feature extraction, embedding, and score calculation via classification or regression. They generally served as a similarity calculator comparing students’ answers to the reference answer. We applied the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework to explore ASAS’s state of the art and uncover its future trends. Our findings reveal that transformer-based models significantly outperform traditional machine learning approaches by capturing complex context. On the other hand, LLMs excel at providing feedback and score justification. Recent studies have shown a shift toward using transformer-based models for ASAS’s complex tasks, including data augmentation and feedback generation. However, further research is needed to use LLMs and GPTs to generate explainable, fairer scores and to address data scarcity through reasoning and augmentation.