Context-aware Stand-alone Neural Spelling Correction
- URL: http://arxiv.org/abs/2011.06642v1
- Date: Thu, 12 Nov 2020 20:34:49 GMT
- Title: Context-aware Stand-alone Neural Spelling Correction
- Authors: Xiangci Li, Hairong Liu, Liang Huang
- Abstract summary: We present a simple yet powerful solution that jointly detects and corrects misspellings as a sequence labeling task by fine-turning a pre-trained language model.
Our solution outperforms the previous state-of-the-art result by 12.8% absolute F0.5 score.
- Score: 11.643354740136953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing natural language processing systems are vulnerable to noisy inputs
resulting from misspellings. On the contrary, humans can easily infer the
corresponding correct words from their misspellings and surrounding context.
Inspired by this, we address the stand-alone spelling correction problem, which
only corrects the spelling of each token without additional token insertion or
deletion, by utilizing both spelling information and global context
representations. We present a simple yet powerful solution that jointly detects
and corrects misspellings as a sequence labeling task by fine-turning a
pre-trained language model. Our solution outperforms the previous
state-of-the-art result by 12.8% absolute F0.5 score.
Related papers
- Automatic Real-word Error Correction in Persian Text [0.0]
This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text.
We employ semantic analysis, feature selection, and advanced classifiers to enhance error detection and correction efficacy.
Our method achieves an impressive F-measure of 96.6% in the detection phase and an accuracy of 99.1% in the correction phase.
arXiv Detail & Related papers (2024-07-20T07:50:52Z) - PERCORE: A Deep Learning-Based Framework for Persian Spelling Correction with Phonetic Analysis [0.0]
This research introduces a state-of-the-art Persian spelling correction system that seamlessly integrates deep learning techniques with phonetic analysis.
Our methodology effectively combines deep contextual analysis with phonetic insights, adeptly correcting both non-word and real-word spelling errors.
A thorough evaluation on a wide-ranging dataset confirms our system's superior performance compared to existing methods.
arXiv Detail & Related papers (2024-07-20T07:41:04Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition.
Taking noisy labels as ground-truth in the loss function results in suboptimal performance.
We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages [2.5874041837241304]
Spelling error correction is the task of identifying and rectifying misspelled words in texts.
Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods.
We propose a novel detector-purificator-corrector, DPC based on denoising transformers by addressing previous issues.
arXiv Detail & Related papers (2022-11-07T17:59:05Z) - Misspelling Correction with Pre-trained Contextual Language Model [0.0]
We present two experiments, based on BERT and the edit distance algorithm, for ranking and selecting candidate corrections.
The results of our experiments demonstrated that when combined properly, contextual word embeddings of BERT and edit distance are capable of effectively correcting spelling errors.
arXiv Detail & Related papers (2021-01-08T20:11:01Z) - NeuSpell: A Neural Spelling Correction Toolkit [88.79419580807519]
NeuSpell is an open-source toolkit for spelling correction in English.
It comprises ten different models, and benchmarks them on misspellings from multiple sources.
We train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated misspellings.
arXiv Detail & Related papers (2020-10-21T15:53:29Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.