Prompting open-source and commercial language models for grammatical
error correction of English learner text
- URL: http://arxiv.org/abs/2401.07702v1
- Date: Mon, 15 Jan 2024 14:19:47 GMT
- Title: Prompting open-source and commercial language models for grammatical
error correction of English learner text
- Authors: Christopher Davis, Andrew Caines, {\O}istein Andersen, Shiva
Taslimipoor, Helen Yannakoudakis, Zheng Yuan, Christopher Bryant, Marek Rei,
Paula Buttery
- Abstract summary: Large language models (LLMs) can be prompt to produce texts which are fluent and grammatical.
We evaluate how well LLMs can perform at grammatical error correction (GEC) by measuring their performance on established benchmark datasets.
We find that several open-source models outperform commercial ones on minimal edit benchmarks, and that in some settings zero-shot prompting is just as competitive as few-shot prompting.
- Score: 19.192210777082053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Thanks to recent advances in generative AI, we are able to prompt large
language models (LLMs) to produce texts which are fluent and grammatical. In
addition, it has been shown that we can elicit attempts at grammatical error
correction (GEC) from LLMs when prompted with ungrammatical input sentences. We
evaluate how well LLMs can perform at GEC by measuring their performance on
established benchmark datasets. We go beyond previous studies, which only
examined GPT* models on a selection of English GEC datasets, by evaluating
seven open-source and three commercial LLMs on four established GEC benchmarks.
We investigate model performance and report results against individual error
types. Our results indicate that LLMs do not always outperform supervised
English GEC models except in specific contexts -- namely commercial LLMs on
benchmarks annotated with fluency corrections as opposed to minimal edits. We
find that several open-source models outperform commercial ones on minimal edit
benchmarks, and that in some settings zero-shot prompting is just as
competitive as few-shot prompting.
Related papers
- Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)
We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy.
We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z) - Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction [19.95974494301433]
Grammatical error correction (GEC) aims to correct grammatical, spelling, and semantic errors in natural language text.
We propose a novel retrieval method based on natural language grammatical error explanations (GEE)
Our method retrieves suitable few-shot demonstrations by matching the GEE of the test input with that of pre-constructed database samples.
arXiv Detail & Related papers (2025-02-12T15:41:43Z) - When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages [9.138590152838754]
Segment-level quality estimation (QE) is a challenging cross-lingual language understanding task.
We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios.
Our results indicate that prompt-based approaches are outperformed by the encoder-based fine-tuned QE models.
arXiv Detail & Related papers (2025-01-08T12:54:05Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction [8.655807096424732]
In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for grammatical error correction.
Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input.
arXiv Detail & Related papers (2024-03-28T10:05:57Z) - Native Language Identification with Large Language Models [60.80452362519818]
We show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark11 test set in a zero-shot setting.
We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes.
arXiv Detail & Related papers (2023-12-13T00:52:15Z) - An Examination of the Compositionality of Large Generative Vision-Language Models [7.639748270719836]
Generative Vision-Language Models (GVLMs) have been constructed via multimodal instruction tuning.
In this paper, we examine both the evaluation metrics (VisualGPTScore, etc.) and current benchmarks for evaluating the compositionality of GVLMs.
We identify the syntactical bias in current benchmarks, which is exploited by the linguistic capability of GVLMs.
arXiv Detail & Related papers (2023-08-21T06:50:29Z) - Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks.
This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - A Unified Strategy for Multilingual Grammatical Error Correction with
Pre-trained Cross-Lingual Language Model [100.67378875773495]
We propose a generic and language-independent strategy for multilingual Grammatical Error Correction.
Our approach creates diverse parallel GEC data without any language-specific operations.
It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian)
arXiv Detail & Related papers (2022-01-26T02:10:32Z) - LM-Critic: Language Models for Unsupervised Grammatical Error Correction [128.9174409251852]
We show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical.
We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector.
arXiv Detail & Related papers (2021-09-14T17:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.