Grammatical Error Generation Based on Translated Fragments
- URL: http://arxiv.org/abs/2104.09933v1
- Date: Tue, 20 Apr 2021 12:43:40 GMT
- Title: Grammatical Error Generation Based on Translated Fragments
- Authors: Eetu Sj\"oblom and Mathias Creutz and Teemu Vahtola
- Abstract summary: We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction.
Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We perform neural machine translation of sentence fragments in order to
create large amounts of training data for English grammatical error correction.
Our method aims at simulating mistakes made by second language learners, and
produces a wider range of non-native style language in comparison to
state-of-the-art synthetic data creation methods. In addition to purely
grammatical errors, our approach generates other types of errors, such as
lexical errors. We perform grammatical error correction experiments using
neural sequence-to-sequence models, and carry out quantitative and qualitative
evaluation. A model trained on data created using our proposed method is shown
to outperform a baseline model on test data with a high proportion of errors.
Related papers
- Byte-Level Grammatical Error Correction Using Synthetic and Curated
Corpora [0.0]
Grammatical error correction (GEC) is the task of correcting typos, spelling, punctuation and grammatical issues in text.
We show that a byte-level model enables higher correction quality than a subword approach.
arXiv Detail & Related papers (2023-05-29T06:35:40Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Judge a Sentence by Its Content to Generate Grammatical Errors [0.0]
We propose a learning based two stage method for synthetic data generation for grammatical error correction.
We show that a GEC model trained on our synthetically generated corpus outperforms models trained on synthetic data from prior work.
arXiv Detail & Related papers (2022-08-20T14:31:34Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - Exploring the Capacity of a Large-scale Masked Language Model to
Recognize Grammatical Errors [3.55517579369797]
We show that 5 to 10% of training data are enough for a BERT-based error detection method to achieve performance equivalent to a non-language model-based method.
We also show with pseudo error data that it actually exhibits such nice properties in learning rules for recognizing various types of error.
arXiv Detail & Related papers (2021-08-27T10:37:14Z) - Synthetic Data Generation for Grammatical Error Correction with Tagged
Corruption Models [15.481446439370343]
We use error type tags from automatic annotation tools such as ERRANT to guide synthetic data generation.
We build a new, large synthetic pre-training data set with error tag frequency distributions matching a given development set.
Our approach is particularly effective in adapting a GEC system, trained on mixed native and non-native English, to a native English test set.
arXiv Detail & Related papers (2021-05-27T17:17:21Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Neural Text Generation with Artificial Negative Examples [7.187858820534111]
We propose to suppress an arbitrary type of errors by training the text generation model in a reinforcement learning framework.
We use a trainable reward function that is capable of discriminating between references and sentences containing the targeted type of errors.
The experimental results show that our method can suppress the generation errors and achieve significant improvements on two machine translation and two image captioning tasks.
arXiv Detail & Related papers (2020-12-28T07:25:10Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.