A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction
- URL: http://arxiv.org/abs/2111.03294v1
- Date: Fri, 5 Nov 2021 07:07:48 GMT
- Title: A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction
- Authors: Zhaohong Wan and Xiaojun Wan
- Abstract summary: Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
- Score: 83.14159143179269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grammatical Error Correction (GEC) is a task of detecting and correcting
grammatical errors in sentences. Recently, neural machine translation systems
have become popular approaches for this task. However, these methods lack the
use of syntactic knowledge which plays an important role in the correction of
grammatical errors. In this work, we propose a syntax-guided GEC model (SG-GEC)
which adopts the graph attention mechanism to utilize the syntactic knowledge
of dependency trees. Considering the dependency trees of the grammatically
incorrect source sentences might provide incorrect syntactic knowledge, we
propose a dependency tree correction task to deal with it. Combining with data
augmentation method, our model achieves strong performances without using any
large pre-trained models. We evaluate our model on public benchmarks of GEC
task and it achieves competitive results.
Related papers
- Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - CSynGEC: Incorporating Constituent-based Syntax for Grammatical Error
Correction with a Tailored GEC-Oriented Parser [22.942594068051488]
This work considers another mainstream syntax formalism, i.e. constituent-based syntax.
We first propose an extended constituent-based syntax scheme to accommodate errors in ungrammatical sentences.
Then, we automatically obtain constituency trees of ungrammatical sentences to train a GEC-oriented constituency.
arXiv Detail & Related papers (2022-11-15T14:11:39Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - LM-Critic: Language Models for Unsupervised Grammatical Error Correction [128.9174409251852]
We show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical.
We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector.
arXiv Detail & Related papers (2021-09-14T17:06:43Z) - Grammatical Error Correction as GAN-like Sequence Labeling [45.19453732703053]
We propose a GAN-like sequence labeling model, which consists of a grammatical error detector as a discriminator and a grammatical error labeler with Gumbel-Softmax sampling as a generator.
Our results on several evaluation benchmarks demonstrate that our proposed approach is effective and improves the previous state-of-the-art baseline.
arXiv Detail & Related papers (2021-05-29T04:39:40Z) - Synthetic Data Generation for Grammatical Error Correction with Tagged
Corruption Models [15.481446439370343]
We use error type tags from automatic annotation tools such as ERRANT to guide synthetic data generation.
We build a new, large synthetic pre-training data set with error tag frequency distributions matching a given development set.
Our approach is particularly effective in adapting a GEC system, trained on mixed native and non-native English, to a native English test set.
arXiv Detail & Related papers (2021-05-27T17:17:21Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z) - Adversarial Grammatical Error Correction [2.132096006921048]
We present an adversarial learning approach to Grammatical Error Correction (GEC) using the generator-discriminator framework.
We pre-train both the discriminator and the generator on parallel texts and then fine-tune them further using a policy gradient method.
Experimental results on FCE, CoNLL-14, and BEA-19 datasets show that Adversarial-GEC can achieve competitive GEC quality compared to NMT-based baselines.
arXiv Detail & Related papers (2020-10-06T00:31:33Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.