The Unveiling the Landscape: Studies on Automated Short Answer Evaluation
Main Article Content
Abstract
Evaluation is an essential component of the learning process when discerning learning situations. Assessing natural language responses, like short answers, takes time and effort. Artificial intelligence and natural language processing advancements have led to more studies on automatically grading short answers. In this review, we systematically analyze short-answer evaluation studies. We present the development of the field in terms of scientific production features, datasets, and automatic evaluation features. The field has developed with pioneering studies in the US. Researchers generally conduct applications with English datasets. There has been a significant increase in research in recent years with large language models that support many different languages. These models have applications that achieve accuracy close to that of human evaluators. In addition, deep learning models do not require traditional approaches' detailed preprocessing and feature engineering processes. The dataset size trend is 1000 and above regarding the number of responses. It was observed that metrics such as accuracy, precision, and F1 score were used in performance determination. It is seen that the majority of the studies focus on scoring or rating. In this context, there needs to be more literature on the context of evaluation system applications that can provide descriptive and formative feedback. In addition, the developed assessment systems must be actively used in learning environments.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
The work published in AjDE is licensed under a Creative Commons Attribution ShareAlike 4.0 International Licence (CC-BY).