Correcting Real-Word Spelling Errors: A New Hybrid Approach
- URL: http://arxiv.org/abs/2302.06407v1
- Date: Thu, 9 Feb 2023 06:03:11 GMT
- Title: Correcting Real-Word Spelling Errors: A New Hybrid Approach
- Authors: Seyed MohammadSadegh Dashti, Amid Khatibi Bardsiri, Vahid Khatibi
Bardsiri
- Abstract summary: A new hybrid approach is proposed which relies on statistical and syntactic knowledge to detect and correct real-word errors.
The model can prove to be more practical than some other models, such as WordNet-based method of Hirst and Budanitsky and fixed windows size method of Wilcox-O'Hearn and Hirst.
- Score: 1.5469452301122175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spelling correction is one of the main tasks in the field of Natural Language
Processing. Contrary to common spelling errors, real-word errors cannot be
detected by conventional spelling correction methods. The real-word correction
model proposed by Mays, Damerau and Mercer showed a great performance in
different evaluations. In this research, however, a new hybrid approach is
proposed which relies on statistical and syntactic knowledge to detect and
correct real-word errors. In this model, Constraint Grammar (CG) is used to
discriminate among sets of correction candidates in the search space. Mays,
Damerau and Mercer's trigram approach is manipulated to estimate the
probability of syntactically well-formed correction candidates. The approach
proposed here is tested on the Wall Street Journal corpus. The model can prove
to be more practical than some other models, such as WordNet-based method of
Hirst and Budanitsky and fixed windows size method of Wilcox-O'Hearn and Hirst.
Related papers
- Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Automatic Real-word Error Correction in Persian Text [0.0]
This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text.
We employ semantic analysis, feature selection, and advanced classifiers to enhance error detection and correction efficacy.
Our method achieves an impressive F-measure of 96.6% in the detection phase and an accuracy of 99.1% in the correction phase.
arXiv Detail & Related papers (2024-07-20T07:50:52Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings [2.2503811834154104]
Typographical Error Type Detection in Persian is a relatively understudied area.
This paper presents a compelling approach for detecting typographical errors in Persian texts.
The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
arXiv Detail & Related papers (2023-05-19T15:05:39Z) - Real-Word Error Correction with Trigrams: Correcting Multiple Errors in
a Sentence [0.0]
We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence.
We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky's WordNet-based method and Wilcox-O'Hearn, Hirst, and Budanitsky's fixed windows size method.
arXiv Detail & Related papers (2023-02-07T13:52:14Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - A Simple and Practical Approach to Improve Misspellings in OCR Text [0.0]
This paper focuses on the identification and correction of non-word errors in OCR text.
Traditional N-gram correction methods can handle single-word errors effectively.
In this paper, we develop an unsupervised method that can handle split and merge errors.
arXiv Detail & Related papers (2021-06-22T19:38:17Z) - NeuSpell: A Neural Spelling Correction Toolkit [88.79419580807519]
NeuSpell is an open-source toolkit for spelling correction in English.
It comprises ten different models, and benchmarks them on misspellings from multiple sources.
We train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated misspellings.
arXiv Detail & Related papers (2020-10-21T15:53:29Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.