Do Grammatical Error Correction Models Realize Grammatical
Generalization?
- URL: http://arxiv.org/abs/2106.03031v1
- Date: Sun, 6 Jun 2021 04:59:29 GMT
- Title: Do Grammatical Error Correction Models Realize Grammatical
Generalization?
- Authors: Masato Mita and Hitomi Yanaka
- Abstract summary: This study explores to what extent GEC models generalize grammatical knowledge required for correcting errors.
We found that a current standard Transformer-based GEC model fails to realize grammatical generalization even in simple settings.
- Score: 8.569720582920416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been an increased interest in data generation approaches to
grammatical error correction (GEC) using pseudo data. However, these approaches
suffer from several issues that make them inconvenient for real-world
deployment including a demand for large amounts of training data. On the other
hand, some errors based on grammatical rules may not necessarily require a
large amount of data if GEC models can realize grammatical generalization. This
study explores to what extent GEC models generalize grammatical knowledge
required for correcting errors. We introduce an analysis method using synthetic
and real GEC datasets with controlled vocabularies to evaluate whether models
can generalize to unseen errors. We found that a current standard
Transformer-based GEC model fails to realize grammatical generalization even in
simple settings with limited vocabulary and syntax, suggesting that it lacks
the generalization ability required to correct errors from provided training
examples.
Related papers
- Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [49.0746090186582]
Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task.
Recent work using model ensemble methods can effectively mitigate over-correction and improve the precision of the GEC system.
We propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble.
arXiv Detail & Related papers (2024-03-26T06:12:21Z) - Advancements in Arabic Grammatical Error Detection and Correction: An
Empirical Investigation [12.15509670220182]
Grammatical error correction (GEC) is a well-explored problem in English.
Research on GEC in morphologically rich languages has been limited due to challenges such as data scarcity and language complexity.
We present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models.
arXiv Detail & Related papers (2023-05-24T05:12:58Z) - Judge a Sentence by Its Content to Generate Grammatical Errors [0.0]
We propose a learning based two stage method for synthetic data generation for grammatical error correction.
We show that a GEC model trained on our synthetically generated corpus outperforms models trained on synthetic data from prior work.
arXiv Detail & Related papers (2022-08-20T14:31:34Z) - ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical
Error Correction [30.917993017459615]
We present a novel parallel grammatical error correction (GEC) dataset drawn from open-domain conversations.
This dataset is, to our knowledge, the first GEC dataset targeted to a conversational setting.
To demonstrate the utility of the dataset, we use our annotated data to fine-tune a state-of-the-art GEC model.
arXiv Detail & Related papers (2021-12-15T20:27:40Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - LM-Critic: Language Models for Unsupervised Grammatical Error Correction [128.9174409251852]
We show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical.
We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector.
arXiv Detail & Related papers (2021-09-14T17:06:43Z) - Synthetic Data Generation for Grammatical Error Correction with Tagged
Corruption Models [15.481446439370343]
We use error type tags from automatic annotation tools such as ERRANT to guide synthetic data generation.
We build a new, large synthetic pre-training data set with error tag frequency distributions matching a given development set.
Our approach is particularly effective in adapting a GEC system, trained on mixed native and non-native English, to a native English test set.
arXiv Detail & Related papers (2021-05-27T17:17:21Z) - Neural Quality Estimation with Multiple Hypotheses for Grammatical Error
Correction [98.31440090585376]
Grammatical Error Correction (GEC) aims to correct writing errors and help language learners improve their writing skills.
Existing GEC models tend to produce spurious corrections or fail to detect lots of errors.
This paper presents the Neural Verification Network (VERNet) for GEC quality estimation with multiple hypotheses.
arXiv Detail & Related papers (2021-05-10T15:04:25Z) - Towards Minimal Supervision BERT-based Grammar Error Correction [81.90356787324481]
We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios.
Results show strong potential of Bidirectional Representations from Transformers (BERT) in grammatical error correction task.
arXiv Detail & Related papers (2020-01-10T15:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.