Grammatical Error Correction: A Survey of the State of the Art
- URL: http://arxiv.org/abs/2211.05166v4
- Date: Sat, 29 Apr 2023 08:33:33 GMT
- Title: Grammatical Error Correction: A Survey of the State of the Art
- Authors: Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee
Tou Ng, Ted Briscoe
- Abstract summary: Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text.
The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks.
- Score: 15.174807142080187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Grammatical Error Correction (GEC) is the task of automatically detecting and
correcting errors in text. The task not only includes the correction of
grammatical errors, such as missing prepositions and mismatched subject-verb
agreement, but also orthographic and semantic errors, such as misspellings and
word choice errors respectively. The field has seen significant progress in the
last decade, motivated in part by a series of five shared tasks, which drove
the development of rule-based methods, statistical classifiers, statistical
machine translation, and finally neural machine translation systems which
represent the current dominant state of the art. In this survey paper, we
condense the field into a single article and first outline some of the
linguistic challenges of the task, introduce the most popular datasets that are
available to researchers (for both English and other languages), and summarise
the various methods and techniques that have been developed with a particular
focus on artificial error generation. We next describe the many different
approaches to evaluation as well as concerns surrounding metric reliability,
especially in relation to subjective human judgements, before concluding with
an overview of recent progress and suggestions for future work and remaining
challenges. We hope that this survey will serve as comprehensive resource for
researchers who are new to the field or who want to be kept apprised of recent
developments.
Related papers
- A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance [1.7000578646860536]
Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors.
This research aims to identify and rectify diverse spelling errors in text using neural networks.
arXiv Detail & Related papers (2024-07-24T16:07:11Z) - A Comparative Study of Transformer-based Neural Text Representation
Techniques on Bug Triaging [8.831760500324318]
We offer one of the first investigations that fine-tunes transformer-based language models for the task of bug triaging.
DeBERTa is the most effective technique across the triaging tasks of developer and component assignment.
arXiv Detail & Related papers (2023-10-10T18:09:32Z) - A Methodology for Generative Spelling Correction via Natural Spelling
Errors Emulation across Multiple Domains and Languages [39.75847219395984]
We present a methodology for generative spelling correction (SC), which was tested on English and Russian languages.
We study the ways those errors can be emulated in correct sentences to effectively enrich generative models' pre-train procedure.
As a practical outcome of our work, we introduce SAGE(Spell checking via Augmentation and Generative distribution Emulation)
arXiv Detail & Related papers (2023-08-18T10:07:28Z) - Recent Advances in Direct Speech-to-text Translation [58.692782919570845]
We categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues.
For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling.
We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching.
arXiv Detail & Related papers (2023-06-20T16:14:27Z) - MISMATCH: Fine-grained Evaluation of Machine-generated Text with
Mismatch Error Types [68.76742370525234]
We propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts.
Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types.
We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
arXiv Detail & Related papers (2023-06-18T01:38:53Z) - Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings [2.2503811834154104]
Typographical Error Type Detection in Persian is a relatively understudied area.
This paper presents a compelling approach for detecting typographical errors in Persian texts.
The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
arXiv Detail & Related papers (2023-05-19T15:05:39Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.