Towards Lithuanian grammatical error correction
- URL: http://arxiv.org/abs/2203.09963v1
- Date: Fri, 18 Mar 2022 13:59:02 GMT
- Title: Towards Lithuanian grammatical error correction
- Authors: Lukas Stankevi\v{c}ius and Mantas Luko\v{s}evi\v{c}ius
- Abstract summary: We construct a grammatical error correction model for Lithuanian, the language rich in archaic features.
We compare subword and byte-level approaches and share our best trained model, achieving F$_0.5$=0.92, and accompanying code, in an online open-source repository.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Everyone wants to write beautiful and correct text, yet the lack of language
skills, experience, or hasty typing can result in errors. By employing the
recent advances in transformer architectures, we construct a grammatical error
correction model for Lithuanian, the language rich in archaic features. We
compare subword and byte-level approaches and share our best trained model,
achieving F$_{0.5}$=0.92, and accompanying code, in an online open-source
repository.
Related papers
- From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages [0.5706164516481158]
We propose a model-agnostic cost-effective approach to developing bilingual base large language models (LLMs) to support English and any target language.
We performed experiments with three languages, each using a non-Latin script - Ukrainian, Arabic, and Georgian.
arXiv Detail & Related papers (2024-10-24T15:20:54Z) - GEE! Grammar Error Explanation with Large Language Models [64.16199533560017]
We propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous and corrected sentences.
We analyze the capability of GPT-4 in grammar error explanation, and find that it only produces explanations for 60.2% of the errors using one-shot prompting.
We develop a two-step pipeline that leverages fine-tuned and prompted large language models to perform structured atomic token edit extraction.
arXiv Detail & Related papers (2023-11-16T02:45:47Z) - A Language Model for Grammatical Error Correction in L2 Russian [0.3149883354098941]
Grammatical error correction is one of the fundamental tasks in Natural Language Processing.
For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing.
We propose a pipeline involving a language model intended for correcting errors in L2 Russian writing.
arXiv Detail & Related papers (2023-07-04T09:50:13Z) - Byte-Level Grammatical Error Correction Using Synthetic and Curated
Corpora [0.0]
Grammatical error correction (GEC) is the task of correcting typos, spelling, punctuation and grammatical issues in text.
We show that a byte-level model enables higher correction quality than a subword approach.
arXiv Detail & Related papers (2023-05-29T06:35:40Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for
Language Education [7.517366022163375]
We present GrammarTagger, an open-source grammar profiler which, given an input text, identifies grammatical features useful for language education.
The model architecture enables it to learn from a small amount of texts annotated with spans and their labels.
We also build Octanove Learn, a search engine of language learning materials indexed by their reading difficulty and grammatical features.
arXiv Detail & Related papers (2021-04-07T15:31:20Z) - Correcting Automated and Manual Speech Transcription Errors using Warped
Language Models [2.8614709576106874]
We propose a novel approach that takes advantage of the robustness of warped language models to transcription noise for correcting transcriptions of spoken language.
We show that our proposed approach is able to achieve up to 10% reduction in word error rates of both automatic and manual transcriptions of spoken language.
arXiv Detail & Related papers (2021-03-26T16:43:23Z) - Improving Translation Robustness with Visual Cues and Error Correction [58.97421756225425]
We introduce the idea of visual context to improve translation robustness against noisy texts.
We also propose a novel error correction training regime by treating error correction as an auxiliary task.
arXiv Detail & Related papers (2021-03-12T15:31:34Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.