Related papers: Grammatical Error Correction Evaluation by Optimally Transporting Edit Representation

Grammatical Error Correction Evaluation by Optimally Transporting Edit Representation

URL: http://arxiv.org/abs/2602.05419v1
Date: Thu, 05 Feb 2026 08:05:42 GMT
Title: Grammatical Error Correction Evaluation by Optimally Transporting Edit Representation
Authors: Takumi Goto, Yusuke Sakai, Taro Watanabe,
Abstract summary: This study focuses on edits specifically designed for grammatical error correction (GEC)<n>We propose edit vector, a representation for an edit, and introduce a new metric, UOT-ERRANT, which transports these edit vectors from hypothesis to reference using unbalanced optimal transport.<n>Experiments with SEEDA meta-evaluation show that UOT-ERRANT improves evaluation performance, particularly in the +Fluency domain.
Score: 34.071151696990384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic evaluation in grammatical error correction (GEC) is crucial for selecting the best-performing systems. Currently, reference-based metrics are a popular choice, which basically measure the similarity between hypothesis and reference sentences. However, similarity measures based on embeddings, such as BERTScore, are often ineffective, since many words in the source sentences remain unchanged in both the hypothesis and the reference. This study focuses on edits specifically designed for GEC, i.e., ERRANT, and computes similarity measured over the edits from the source sentence. To this end, we propose edit vector, a representation for an edit, and introduce a new metric, UOT-ERRANT, which transports these edit vectors from hypothesis to reference using unbalanced optimal transport. Experiments with SEEDA meta-evaluation show that UOT-ERRANT improves evaluation performance, particularly in the +Fluency domain where many edits occur. Moreover, our method is highly interpretable because the transport plan can be interpreted as a soft edit alignment, making UOT-ERRANT a useful metric for both system ranking and analyzing GEC systems. Our code is available from https://github.com/gotutiyan/uot-errant.

Related papers

RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation [44.80444520411601]
The primary objective of Chinese grammatical error correction (CGEC) is to detect and correct errors in Chinese sentences.<n>For large language models (LLMs), selecting appropriate reference examples can help improve their performance.<n>We propose a method named RE$2$, which retrieves appropriate examples with explanations of grammatical errors.
arXiv Detail & Related papers (2025-09-30T10:14:19Z)
Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction [11.512856112792093]
We propose attributing sentence-level scores to individual edits, providing insight into how specific corrections contribute to the overall performance.<n> Experiments with existing sentence-level metrics demonstrate high consistency across different edit granularities and show approximately 70% alignment with human evaluations.<n>In addition, we analyze biases in the metrics based on the attribution results, revealing trends such as the tendency to ignore orthographic edits.
arXiv Detail & Related papers (2024-12-17T17:31:17Z)
LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [49.0746090186582]
Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task. Recent work using model ensemble methods can effectively mitigate over-correction and improve the precision of the GEC system. We propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble.
arXiv Detail & Related papers (2024-03-26T06:12:21Z)
Revisiting Meta-evaluation for Grammatical Error Correction [14.822205658480813]
SEEDA is a new dataset for GEC meta-evaluation. It consists of corrections with human ratings along two different granularities. The results suggest that edit-based metrics may have been underestimated in existing studies.
arXiv Detail & Related papers (2024-03-05T05:53:09Z)
CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction [32.44051877804761]
Chunk-LEvel Multi-reference Evaluation (CLEME) is designed to evaluate Grammatical Error Correction (GEC) systems in the multi-reference evaluation setting. We conduct experiments on six English reference sets based on the CoNLL-2014 shared task.
arXiv Detail & Related papers (2023-05-18T08:57:17Z)
End-to-End Page-Level Assessment of Handwritten Text Recognition [69.55992406968495]
HTR systems increasingly face the end-to-end page-level transcription of a document. Standard metrics do not take into account the inconsistencies that might appear. We propose a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately.
arXiv Detail & Related papers (2023-01-14T15:43:07Z)
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words. We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE. The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z)
Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment [1.4213973379473654]
GEC systems are often used on speech transcriptions of English learners as a form of assessment and feedback. The count of edits from a candidate's input sentence to a GEC system's grammatically corrected output sentence is indicative of a candidate's language ability. This work examines a simple universal substitution adversarial attack that non-native speakers of English could realistically employ to deceive GEC systems used for assessment.
arXiv Detail & Related papers (2022-08-19T17:44:13Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)
A Syntax-Guided Grammatical Error Correction Model with Dependency Tree Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences. We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees. We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.