Misspelling Correction with Pre-trained Contextual Language Model
- URL: http://arxiv.org/abs/2101.03204v1
- Date: Fri, 8 Jan 2021 20:11:01 GMT
- Title: Misspelling Correction with Pre-trained Contextual Language Model
- Authors: Yifei Hu, Xiaonan Jing, Youlim Ko, Julia Taylor Rayz
- Abstract summary: We present two experiments, based on BERT and the edit distance algorithm, for ranking and selecting candidate corrections.
The results of our experiments demonstrated that when combined properly, contextual word embeddings of BERT and edit distance are capable of effectively correcting spelling errors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Spelling irregularities, known now as spelling mistakes, have been found for
several centuries. As humans, we are able to understand most of the misspelled
words based on their location in the sentence, perceived pronunciation, and
context. Unlike humans, computer systems do not possess the convenient auto
complete functionality of which human brains are capable. While many programs
provide spelling correction functionality, many systems do not take context
into account. Moreover, Artificial Intelligence systems function in the way
they are trained on. With many current Natural Language Processing (NLP)
systems trained on grammatically correct text data, many are vulnerable against
adversarial examples, yet correctly spelled text processing is crucial for
learning. In this paper, we investigate how spelling errors can be corrected in
context, with a pre-trained language model BERT. We present two experiments,
based on BERT and the edit distance algorithm, for ranking and selecting
candidate corrections. The results of our experiments demonstrated that when
combined properly, contextual word embeddings of BERT and edit distance are
capable of effectively correcting spelling errors.
Related papers
- A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance [1.7000578646860536]
Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors.
This research aims to identify and rectify diverse spelling errors in text using neural networks.
arXiv Detail & Related papers (2024-07-24T16:07:11Z) - Automatic Real-word Error Correction in Persian Text [0.0]
This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text.
We employ semantic analysis, feature selection, and advanced classifiers to enhance error detection and correction efficacy.
Our method achieves an impressive F-measure of 96.6% in the detection phase and an accuracy of 99.1% in the correction phase.
arXiv Detail & Related papers (2024-07-20T07:50:52Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - A Proposal of Automatic Error Correction in Text [0.0]
It is shown an application of automatic recognition and correction of ortographic errors in electronic texts.
The proposal is based in part of speech text categorization, word similarity, word diccionaries, statistical measures, morphologic analisys and n-grams based language model of Spanish.
arXiv Detail & Related papers (2021-09-24T17:17:56Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - NeuSpell: A Neural Spelling Correction Toolkit [88.79419580807519]
NeuSpell is an open-source toolkit for spelling correction in English.
It comprises ten different models, and benchmarks them on misspellings from multiple sources.
We train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated misspellings.
arXiv Detail & Related papers (2020-10-21T15:53:29Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z) - Spelling Error Correction with Soft-Masked BERT [11.122964733563117]
A state-of-the-art method for the task selects a character from a list of candidates for correction at each position of the sentence on the basis of BERT.
The accuracy of the method can be sub-optimal because BERT does not have sufficient capability to detect whether there is an error at each position.
We propose a novel neural architecture to address the aforementioned issue, which consists of a network for error detection and a network for error correction based on BERT.
arXiv Detail & Related papers (2020-05-15T09:02:38Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.