Hierarchical Transformer Encoders for Vietnamese Spelling Correction
- URL: http://arxiv.org/abs/2105.13578v1
- Date: Fri, 28 May 2021 04:09:15 GMT
- Title: Hierarchical Transformer Encoders for Vietnamese Spelling Correction
- Authors: Hieu Tran, Cuong V. Dinh, Long Phan, and Son T. Nguyen
- Abstract summary: We propose a Hierarchical Transformer model for Vietnamese spelling correction problem.
The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections.
- Score: 1.0779600811805266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a Hierarchical Transformer model for Vietnamese
spelling correction problem. The model consists of multiple Transformer
encoders and utilizes both character-level and word-level to detect errors and
make corrections. In addition, to facilitate future work in Vietnamese spelling
correction tasks, we propose a realistic dataset collected from real-life texts
for the problem. We compare our method with other methods and publicly
available systems. The proposed method outperforms all of the contemporary
methods in terms of recall, precision, and f1-score. A demo version is publicly
available.
Related papers
- EdaCSC: Two Easy Data Augmentation Methods for Chinese Spelling Correction [0.0]
Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in Chinese sentences caused by phonetic or visual similarities.
We propose two data augmentation methods to address these limitations.
Firstly, we augment the dataset by either splitting long sentences into shorter ones or reducing typos in sentences with multiple typos.
arXiv Detail & Related papers (2024-09-08T14:29:10Z) - A Combination of BERT and Transformer for Vietnamese Spelling Correction [0.0]
There is no implementation in Vietnamese yet.
Our model outperforms other approaches as well as the Google Docs Spell Checking tool, achieves an 86.24 BLEU score on this task.
arXiv Detail & Related papers (2024-05-04T05:24:19Z) - LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [49.0746090186582]
Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task.
Recent work using model ensemble methods can effectively mitigate over-correction and improve the precision of the GEC system.
We propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble.
arXiv Detail & Related papers (2024-03-26T06:12:21Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - An Error-Guided Correction Model for Chinese Spelling Error Correction [13.56600372085612]
We propose an error-guided correction model (EGCM) to improve Chinese spelling correction.
Our model achieves superior performance against state-of-the-art approaches by a remarkable margin.
arXiv Detail & Related papers (2023-01-16T09:27:45Z) - VSEC: Transformer-based Model for Vietnamese Spelling Correction [0.19116784879310028]
We propose a novel method to correct Vietnamese spelling errors.
We tackle the problems of mistyped errors and misspelled errors by using a deep learning model.
The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected.
arXiv Detail & Related papers (2021-11-01T00:55:32Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Spelling Correction with Denoising Transformer [0.0]
We present a novel method of performing spelling correction on short input strings, such as search queries or individual words.
At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans.
This procedure is used to train the production spelling correction model based on a transformer architecture.
arXiv Detail & Related papers (2021-05-12T21:35:18Z) - Neural String Edit Distance [77.72325513792981]
We propose the neural string edit distance model for string-pair classification and sequence generation.
We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function.
We show that we can trade off between performance and interpretability in a single framework.
arXiv Detail & Related papers (2021-04-16T22:16:47Z) - Improving Translation Robustness with Visual Cues and Error Correction [58.97421756225425]
We introduce the idea of visual context to improve translation robustness against noisy texts.
We also propose a novel error correction training regime by treating error correction as an auxiliary task.
arXiv Detail & Related papers (2021-03-12T15:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.