Real-Word Error Correction with Trigrams: Correcting Multiple Errors in
a Sentence
- URL: http://arxiv.org/abs/2302.04096v1
- Date: Tue, 7 Feb 2023 13:52:14 GMT
- Title: Real-Word Error Correction with Trigrams: Correcting Multiple Errors in
a Sentence
- Authors: Seyed MohammadSadegh Dashti
- Abstract summary: We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence.
We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky's WordNet-based method and Wilcox-O'Hearn, Hirst, and Budanitsky's fixed windows size method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spelling correction is a fundamental task in Text Mining. In this study, we
assess the real-word error correction model proposed by Mays, Damerau and
Mercer and describe several drawbacks of the model. We propose a new variation
which focuses on detecting and correcting multiple real-word errors in a
sentence, by manipulating a Probabilistic Context-Free Grammar (PCFG) to
discriminate between items in the search space. We test our approach on the
Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky's
WordNet-based method and Wilcox-O'Hearn, Hirst, and Budanitsky's fixed windows
size method.-O'Hearn, Hirst, and Budanitsky's fixed windows size method.
Related papers
- Automatic Real-word Error Correction in Persian Text [0.0]
This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text.
We employ semantic analysis, feature selection, and advanced classifiers to enhance error detection and correction efficacy.
Our method achieves an impressive F-measure of 96.6% in the detection phase and an accuracy of 99.1% in the correction phase.
arXiv Detail & Related papers (2024-07-20T07:50:52Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings [2.2503811834154104]
Typographical Error Type Detection in Persian is a relatively understudied area.
This paper presents a compelling approach for detecting typographical errors in Persian texts.
The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
arXiv Detail & Related papers (2023-05-19T15:05:39Z) - Correcting Real-Word Spelling Errors: A New Hybrid Approach [1.5469452301122175]
A new hybrid approach is proposed which relies on statistical and syntactic knowledge to detect and correct real-word errors.
The model can prove to be more practical than some other models, such as WordNet-based method of Hirst and Budanitsky and fixed windows size method of Wilcox-O'Hearn and Hirst.
arXiv Detail & Related papers (2023-02-09T06:03:11Z) - A Simple and Practical Approach to Improve Misspellings in OCR Text [0.0]
This paper focuses on the identification and correction of non-word errors in OCR text.
Traditional N-gram correction methods can handle single-word errors effectively.
In this paper, we develop an unsupervised method that can handle split and merge errors.
arXiv Detail & Related papers (2021-06-22T19:38:17Z) - Learning by Fixing: Solving Math Word Problems with Weak Supervision [70.62896781438694]
Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions.
We introduce a textitweakly-supervised paradigm for learning MWPs.
Our method only requires the annotations of the final answers and can generate various solutions for a single problem.
arXiv Detail & Related papers (2020-12-19T03:10:21Z) - NeuSpell: A Neural Spelling Correction Toolkit [88.79419580807519]
NeuSpell is an open-source toolkit for spelling correction in English.
It comprises ten different models, and benchmarks them on misspellings from multiple sources.
We train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated misspellings.
arXiv Detail & Related papers (2020-10-21T15:53:29Z) - Tokenization Repair in the Presence of Spelling Errors [0.2964978357715083]
Spelling errors can be present, but it's not part of the problem to correct them.
We identify three key ingredients of high-quality tokenization repair.
arXiv Detail & Related papers (2020-10-15T16:55:45Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.