A Language Model for Grammatical Error Correction in L2 Russian
- URL: http://arxiv.org/abs/2307.01609v1
- Date: Tue, 4 Jul 2023 09:50:13 GMT
- Title: A Language Model for Grammatical Error Correction in L2 Russian
- Authors: Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov,
Anastasia Vyrenkova
- Abstract summary: Grammatical error correction is one of the fundamental tasks in Natural Language Processing.
For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing.
We propose a pipeline involving a language model intended for correcting errors in L2 Russian writing.
- Score: 0.3149883354098941
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Grammatical error correction is one of the fundamental tasks in Natural
Language Processing. For the Russian language, most of the spellcheckers
available correct typos and other simple errors with high accuracy, but often
fail when faced with non-native (L2) writing, since the latter contains errors
that are not typical for native speakers. In this paper, we propose a pipeline
involving a language model intended for correcting errors in L2 Russian
writing. The language model proposed is trained on untagged texts of the
Newspaper subcorpus of the Russian National Corpus, and the quality of the
model is validated against the RULEC-GEC corpus.
Related papers
- A Coin Has Two Sides: A Novel Detector-Corrector Framework for Chinese Spelling Correction [79.52464132360618]
Chinese Spelling Correction (CSC) stands as a foundational Natural Language Processing (NLP) task.
We introduce a novel approach based on error detector-corrector framework.
Our detector is designed to yield two error detection results, each characterized by high precision and recall.
arXiv Detail & Related papers (2024-09-06T09:26:45Z) - Rectifier: Code Translation with Corrector via LLMs [11.38401806203093]
We propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors.
The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability.
arXiv Detail & Related papers (2024-07-10T08:58:41Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Does Correction Remain A Problem For Large Language Models? [63.24433996856764]
This paper investigates the role of correction in the context of large language models by conducting two experiments.
The first experiment focuses on correction as a standalone task, employing few-shot learning techniques with GPT-like models for error correction.
The second experiment explores the notion of correction as a preparatory task for other NLP tasks, examining whether large language models can tolerate and perform adequately on texts containing certain levels of noise or errors.
arXiv Detail & Related papers (2023-08-03T14:09:31Z) - Byte-Level Grammatical Error Correction Using Synthetic and Curated
Corpora [0.0]
Grammatical error correction (GEC) is the task of correcting typos, spelling, punctuation and grammatical issues in text.
We show that a byte-level model enables higher correction quality than a subword approach.
arXiv Detail & Related papers (2023-05-29T06:35:40Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Towards Lithuanian grammatical error correction [0.0]
We construct a grammatical error correction model for Lithuanian, the language rich in archaic features.
We compare subword and byte-level approaches and share our best trained model, achieving F$_0.5$=0.92, and accompanying code, in an online open-source repository.
arXiv Detail & Related papers (2022-03-18T13:59:02Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Correcting Automated and Manual Speech Transcription Errors using Warped
Language Models [2.8614709576106874]
We propose a novel approach that takes advantage of the robustness of warped language models to transcription noise for correcting transcriptions of spoken language.
We show that our proposed approach is able to achieve up to 10% reduction in word error rates of both automatic and manual transcriptions of spoken language.
arXiv Detail & Related papers (2021-03-26T16:43:23Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.