MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese
Grammatical Error Correction
- URL: http://arxiv.org/abs/2204.10994v1
- Date: Sat, 23 Apr 2022 05:20:38 GMT
- Title: MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese
Grammatical Error Correction
- Authors: Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei
Huang, Min Zhang
- Abstract summary: MuCGEC is a multi-reference evaluation dataset for Chinese Grammatical Error Correction (CGEC)
It consists of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources.
Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.
- Score: 51.3754092853434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents MuCGEC, a multi-reference multi-source evaluation dataset
for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences
collected from three different Chinese-as-a-Second-Language (CSL) learner
sources. Each sentence has been corrected by three annotators, and their
corrections are meticulously reviewed by an expert, resulting in 2.3 references
per sentence. We conduct experiments with two mainstream CGEC models, i.e., the
sequence-to-sequence (Seq2Seq) model and the sequence-to-edit (Seq2Edit) model,
both enhanced with large pretrained language models (PLMs), achieving
competitive benchmark performance on previous and our datasets. We also discuss
CGEC evaluation methodologies, including the effect of multiple references and
using a char-based metric. Our annotation guidelines, data, and code are
available at \url{https://github.com/HillZhang1999/MuCGEC}.
Related papers
- System Report for CCL24-Eval Task 7: Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation [1.8856984887896766]
We optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus.
In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence.
In Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pre-training and used an NSP-based strategy.
arXiv Detail & Related papers (2024-07-11T06:17:08Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Improving Seq2Seq Grammatical Error Correction via Decoding
Interventions [40.52259641181596]
We propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally.
We discover and investigate two types of critics: a pre-trained left-to-right language model critic and an incremental target-side grammatical error detector critic.
Our framework consistently outperforms strong baselines and achieves results competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-10-23T03:36:37Z) - FlaCGEC: A Chinese Grammatical Error Correction Dataset with
Fine-grained Linguistic Annotation [11.421545095092815]
FlaCGEC is a new CGEC dataset featured with fine-grained linguistic annotation.
We collect raw corpus from the linguistic schema defined by Chinese language experts, conduct edits on sentences via rules, and refine generated samples manually.
We evaluate various cutting-edge CGEC methods on the proposed FlaCGEC dataset and their unremarkable results indicate that this dataset is challenging in covering a large range of grammatical errors.
arXiv Detail & Related papers (2023-09-26T10:22:43Z) - Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical
Error Correction [36.74272211767197]
We propose a linguistic rules-based approach to construct large-scale CGEC training corpora with automatically generated grammatical errors.
We present a challenging CGEC benchmark derived entirely from errors made by native Chinese speakers in real-world scenarios.
arXiv Detail & Related papers (2022-10-19T10:20:39Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - A Unified Strategy for Multilingual Grammatical Error Correction with
Pre-trained Cross-Lingual Language Model [100.67378875773495]
We propose a generic and language-independent strategy for multilingual Grammatical Error Correction.
Our approach creates diverse parallel GEC data without any language-specific operations.
It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian)
arXiv Detail & Related papers (2022-01-26T02:10:32Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Consistency Regularization for Cross-Lingual Fine-Tuning [61.08704789561351]
We propose to improve cross-lingual fine-tuning with consistency regularization.
Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations.
Experimental results on the XTREME benchmark show that our method significantly improves cross-lingual fine-tuning across various tasks.
arXiv Detail & Related papers (2021-06-15T15:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.