Correcting the Autocorrect: Context-Aware Typographical Error Correction
via Training Data Augmentation
- URL: http://arxiv.org/abs/2005.01158v1
- Date: Sun, 3 May 2020 18:08:17 GMT
- Title: Correcting the Autocorrect: Context-Aware Typographical Error Correction
via Training Data Augmentation
- Authors: Kshitij Shah, Gerard de Melo
- Abstract summary: We first draw on a small set of annotated data to compute spelling error statistics.
These are then invoked to introduce errors into substantially larger corpora.
We use it to create a set of English language error detection and correction datasets.
- Score: 38.10429793534442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore the artificial generation of typographical errors
based on real-world statistics. We first draw on a small set of annotated data
to compute spelling error statistics. These are then invoked to introduce
errors into substantially larger corpora. The generation methodology allows us
to generate particularly challenging errors that require context-aware error
detection. We use it to create a set of English language error detection and
correction datasets. Finally, we examine the effectiveness of machine learning
models for detecting and correcting errors based on this data. The datasets are
available at http://typo.nlproc.org
Related papers
- Assessing the Efficacy of Grammar Error Correction: A Human Evaluation
Approach in the Japanese Context [10.047123247001714]
We evaluate the performance of the state-of-the-art sequence tagging grammar error detection and correction model (SeqTagger)
With an automatic annotation toolkit, ERRANT, we first evaluated SeqTagger's performance on error correction with human expert correction as the benchmark.
Results indicated a precision of 63.66% and a recall of 20.19% for error correction in the full dataset.
arXiv Detail & Related papers (2024-02-28T06:43:43Z) - Parameter-tuning-free data entry error unlearning with adaptive
selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning.
We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks.
The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese
Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction.
Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected.
Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z) - Grammatical Error Generation Based on Translated Fragments [0.0]
We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction.
Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language.
arXiv Detail & Related papers (2021-04-20T12:43:40Z) - Defuse: Harnessing Unrestricted Adversarial Examples for Debugging
Models Beyond Test Accuracy [11.265020351747916]
Defuse is a method to automatically discover and correct model errors beyond those available in test data.
We propose an algorithm inspired by adversarial machine learning techniques that uses a generative model to find naturally occurring instances misclassified by a model.
Defuse corrects the error after fine-tuning while maintaining generalization on the test set.
arXiv Detail & Related papers (2021-02-11T18:08:42Z) - Deep Neural Network: An Efficient and Optimized Machine Learning
Paradigm for Reducing Genome Sequencing Error [27.84400682210533]
It has become known that most of the platforms used in the sequencing process produce significant errors.
On the two main types of genome errors - substitution and indels - our work is focused on correcting indels.
A deep learning approach was used to correct the errors in sequencing the chosen dataset.
arXiv Detail & Related papers (2020-10-06T08:16:35Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - Towards Minimal Supervision BERT-based Grammar Error Correction [81.90356787324481]
We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios.
Results show strong potential of Bidirectional Representations from Transformers (BERT) in grammatical error correction task.
arXiv Detail & Related papers (2020-01-10T15:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.