Neural Automated Writing Evaluation with Corrective Feedback
- URL: http://arxiv.org/abs/2402.17613v2
- Date: Mon, 6 May 2024 10:02:08 GMT
- Title: Neural Automated Writing Evaluation with Corrective Feedback
- Authors: Izia Xiaoxiao Wang, Xihan Wu, Edith Coates, Min Zeng, Jiexin Kuang, Siliang Liu, Mengyang Qiu, Jungyeul Park,
- Abstract summary: We propose an integrated system for automated writing evaluation with corrective feedback.
This system enables language learners to simulate the essay writing tests.
It would also alleviate the burden of manually correcting innumerable essays.
- Score: 4.0230668961961085
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural language processing (NLP) and machine learning algorithms, AWE and GEC systems have been developed separately to provide language learners with automated corrective feedback and more accurate and unbiased scoring that would otherwise be subject to examiners. In this paper, we propose an integrated system for automated writing evaluation with corrective feedback as a means of bridging the gap between AWE and GEC results for second language learners. This system enables language learners to simulate the essay writing tests: a student writes and submits an essay, and the system returns the assessment of the writing along with suggested grammatical error corrections. Given that automated scoring and grammatical correction are more efficient and cost-effective than human grading, this integrated system would also alleviate the burden of manually correcting innumerable essays.
Related papers
- Spoken Grammar Assessment Using LLM [10.761744330206065]
Spoken language assessment (SLA) systems analyse the pronunciation and oral fluency of a speaker by analysing the read and spontaneous spoken utterances respectively.
Most WLA systems present a set of sentences from a curated finite-size database of sentences thereby making it possible to anticipate the test questions and train oneself.
We propose a novel end-to-end SLA system to assess language grammar from spoken utterances thus making WLA systems redundant.
arXiv Detail & Related papers (2024-10-02T14:15:13Z) - Grammatical Error Feedback: An Implicit Evaluation Approach [32.98100553225724]
Grammatical feedback is crucial for consolidating second language (L2) learning.
Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems.
This paper exploits this framework to examine the quality and need for GEC to generate feedback, as well as the system used to generate feedback, using essays from the Cambridge Corpus Learner.
arXiv Detail & Related papers (2024-08-18T18:31:55Z) - LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier.
Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - Evaluation of really good grammatical error correction [0.0]
Grammatical Error Correction (GEC) encompasses various models with distinct objectives.
Traditional evaluation methods fail to fully capture the full range of system capabilities and objectives.
arXiv Detail & Related papers (2023-08-17T13:45:35Z) - User Adaptive Language Learning Chatbots with a Curriculum [55.63893493019025]
We adapt lexically constrained decoding to a dialog system, which urges the dialog system to include curriculum-aligned words and phrases in its generated utterances.
The evaluation result demonstrates that the dialog system with curriculum infusion improves students' understanding of target words and increases their interest in practicing English.
arXiv Detail & Related papers (2023-04-11T20:41:41Z) - Gender Bias and Universal Substitution Adversarial Attacks on
Grammatical Error Correction Systems for Automated Assessment [1.4213973379473654]
GEC systems are often used on speech transcriptions of English learners as a form of assessment and feedback.
The count of edits from a candidate's input sentence to a GEC system's grammatically corrected output sentence is indicative of a candidate's language ability.
This work examines a simple universal substitution adversarial attack that non-native speakers of English could realistically employ to deceive GEC systems used for assessment.
arXiv Detail & Related papers (2022-08-19T17:44:13Z) - A Proposal of Automatic Error Correction in Text [0.0]
It is shown an application of automatic recognition and correction of ortographic errors in electronic texts.
The proposal is based in part of speech text categorization, word similarity, word diccionaries, statistical measures, morphologic analisys and n-grams based language model of Spanish.
arXiv Detail & Related papers (2021-09-24T17:17:56Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.