RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation
- URL: http://arxiv.org/abs/2509.26038v1
- Date: Tue, 30 Sep 2025 10:14:19 GMT
- Title: RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation
- Authors: Baoxin Wang, Yumeng Luo, Yixuan Wang, Dayong Wu, Wanxiang Che, Shijin Wang,
- Abstract summary: The primary objective of Chinese grammatical error correction (CGEC) is to detect and correct errors in Chinese sentences.<n>For large language models (LLMs), selecting appropriate reference examples can help improve their performance.<n>We propose a method named RE$2$, which retrieves appropriate examples with explanations of grammatical errors.
- Score: 44.80444520411601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The primary objective of Chinese grammatical error correction (CGEC) is to detect and correct errors in Chinese sentences. Recent research shows that large language models (LLMs) have been applied to CGEC with significant results. For LLMs, selecting appropriate reference examples can help improve their performance. However, existing methods predominantly rely on text similarity for example retrieval, a strategy that frequently mismatches actual error patterns and retrieves lexically similar yet grammatically irrelevant sentences. To address this problem, we propose a method named RE$^2$, which retrieves appropriate examples with explanations of grammatical errors. Instead of using text similarity of the input sentence, we use explanations of grammatical errors to select reference examples, which are used by LLMs to improve the performance of CGEC. We conduct experiments on two CGEC datasets and create a high-quality grammatical error explanation (GEE) dataset, which is not only used in our research but also serves as a valuable resource for future studies in both CGEC and GEE. The experimental results on the two datasets indicate that our proposed method effectively improves the performance of CGEC.
Related papers
- Adapting LLMs for Minimal-edit Grammatical Error Correction [0.0]
We explore the error rate adaptation topic and propose a novel training schedule method.<n>Our experiments set a new state-of-the-art result for a single-model system on the BEA-test set.<n>We analyze whether training on detokenized datasets impacts the results and measure the impact of the usage of datasets with corrected erroneous examples.
arXiv Detail & Related papers (2025-06-16T07:00:48Z) - Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction [19.95974494301433]
Grammatical error correction (GEC) aims to correct grammatical, spelling, and semantic errors in natural language text.<n>We propose a novel retrieval method based on natural language grammatical error explanations (GEE)<n>Our method retrieves suitable few-shot demonstrations by matching the GEE of the test input with that of pre-constructed database samples.
arXiv Detail & Related papers (2025-02-12T15:41:43Z) - Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction [21.82403446634522]
Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences.<n>Current approaches ignore that correction difficulty varies across different instances and treat these samples equally.<n>We propose a multi-granularity Curriculum Learning framework to address this problem.
arXiv Detail & Related papers (2024-12-31T08:11:49Z) - Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction [8.655807096424732]
In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for grammatical error correction.
Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input.
arXiv Detail & Related papers (2024-03-28T10:05:57Z) - LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [49.0746090186582]
Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task.
Recent work using model ensemble methods can effectively mitigate over-correction and improve the precision of the GEC system.
We propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble.
arXiv Detail & Related papers (2024-03-26T06:12:21Z) - MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese
Grammatical Error Correction [51.3754092853434]
MuCGEC is a multi-reference evaluation dataset for Chinese Grammatical Error Correction (CGEC)
It consists of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources.
Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.
arXiv Detail & Related papers (2022-04-23T05:20:38Z) - Interpretability for Language Learners Using Example-Based Grammatical
Error Correction [27.850970793739933]
We introduce an Example-Based GEC (EB-GEC) that presents examples to language learners as a basis for a correction result.
Experiments demonstrate that the examples presented by EB-GEC help language learners decide to accept or refuse suggestions from the GEC output.
arXiv Detail & Related papers (2022-03-14T13:15:00Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - LM-Critic: Language Models for Unsupervised Grammatical Error Correction [128.9174409251852]
We show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical.
We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector.
arXiv Detail & Related papers (2021-09-14T17:06:43Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.