Related papers: Grammatical Error Correction: A Survey of the State of the Art

Grammatical Error Correction: A Survey of the State of the Art

URL: http://arxiv.org/abs/2211.05166v4
Date: Sat, 29 Apr 2023 08:33:33 GMT
Title: Grammatical Error Correction: A Survey of the State of the Art
Authors: Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe
Abstract summary: Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks.
Score: 15.174807142080187
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.

Related papers

Two Spelling Normalization Approaches Based on Large Language Models [3.207455883863625]
spelling normalization endeavors to align a document's orthography with contemporary standards.<n>We propose two new approaches based on large language models, one of which has been trained without a supervised training, and a second one which has been trained for machine translation.<n>Our evaluation spans multiple datasets encompassing diverse languages and historical periods, leading us to the conclusion that while both of them yielded encouraging results, statistical machine translation still seems to be the most suitable technology for this task.
arXiv Detail & Related papers (2025-06-29T15:25:09Z)
Misspellings in Natural Language Processing: A survey [52.419589623702336]
misspellings have become ubiquitous in digital communication. We reconstruct a history of misspellings as a scientific problem. We discuss the latest advancements to address the challenge of misspellings in NLP.
arXiv Detail & Related papers (2025-01-28T10:26:04Z)
A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance [1.7000578646860536]
Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors. This research aims to identify and rectify diverse spelling errors in text using neural networks.
arXiv Detail & Related papers (2024-07-24T16:07:11Z)
A Comparative Study of Transformer-based Neural Text Representation Techniques on Bug Triaging [8.831760500324318]
We offer one of the first investigations that fine-tunes transformer-based language models for the task of bug triaging. DeBERTa is the most effective technique across the triaging tasks of developer and component assignment.
arXiv Detail & Related papers (2023-10-10T18:09:32Z)
A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and Languages [39.75847219395984]
We present a methodology for generative spelling correction (SC), which was tested on English and Russian languages. We study the ways those errors can be emulated in correct sentences to effectively enrich generative models' pre-train procedure. As a practical outcome of our work, we introduce SAGE(Spell checking via Augmentation and Generative distribution Emulation)
arXiv Detail & Related papers (2023-08-18T10:07:28Z)
You Can Generate It Again: Data-to-Text Generation with Verification and Correction Prompting [24.738004421537926]
Small language models like T5 excel in generating high-quality text for data-to-text tasks.<n>They frequently miss keywords, which is considered one of the most severe and common errors in this task.<n>We explore the potential of using feedback systems to enhance semantic fidelity in smaller language models for data-to-text generation tasks.
arXiv Detail & Related papers (2023-06-28T05:34:25Z)
Recent Advances in Direct Speech-to-text Translation [58.692782919570845]
We categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching.
arXiv Detail & Related papers (2023-06-20T16:14:27Z)
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types [68.76742370525234]
We propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
arXiv Detail & Related papers (2023-06-18T01:38:53Z)
Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings [2.2503811834154104]
Typographical Error Type Detection in Persian is a relatively understudied area. This paper presents a compelling approach for detecting typographical errors in Persian texts. The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
arXiv Detail & Related papers (2023-05-19T15:05:39Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors. Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z)
Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation. This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them. In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.