Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task
- URL: http://arxiv.org/abs/2307.03972v1
- Date: Sat, 8 Jul 2023 13:10:59 GMT
- Title: Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task
- Authors: Fanyi Qu and Yunfang Wu
- Abstract summary: Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks.
This report explores the how large language models perform on Chinese grammatical error correction tasks.
- Score: 10.597024796304016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale language models (LLMs) has shown remarkable capability in various
of Natural Language Processing (NLP) tasks and attracted lots of attention
recently. However, some studies indicated that large language models fail to
achieve promising result beyond the state-of-the-art models in English
grammatical error correction (GEC) tasks. In this report, we aim to explore the
how large language models perform on Chinese grammatical error correction tasks
and provide guidance for future work. We conduct experiments with 3 different
LLMs of different model scale on 4 Chinese GEC dataset. Our experimental
results indicate that the performances of LLMs on automatic evaluation metrics
falls short of the previous sota models because of the problem of
over-correction. Furthermore, we also discover notable variations in the
performance of LLMs when evaluated on different data distributions. Our
findings demonstrates that further investigation is required for the
application of LLMs on Chinese GEC task.
Related papers
- What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models.
This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Harnessing Large Language Models as Post-hoc Correctors [6.288056740658763]
We show that an LLM can work as a post-hoc corrector to propose corrections for the predictions of an arbitrary Machine Learning model.
We form a contextual knowledge database by incorporating the dataset's label information and the ML model's predictions on the validation dataset.
Our experimental results on text analysis and the challenging molecular predictions show that model improves the performance of a number of models by up to 39%.
arXiv Detail & Related papers (2024-02-20T22:50:41Z) - Rethinking the Roles of Large Language Models in Chinese Grammatical
Error Correction [62.409807640887834]
Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences.
LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus.
We rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC.
arXiv Detail & Related papers (2024-02-18T01:40:34Z) - Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation [64.5862977630713]
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task.
We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive.
arXiv Detail & Related papers (2024-01-12T13:23:21Z) - Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks.
Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning.
This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z) - Are Large Language Models Good Fact Checkers: A Preliminary Study [26.023148371263012]
Large Language Models (LLMs) have drawn significant attention due to their outstanding reasoning capabilities and extensive knowledge repository.
This study aims to comprehensively evaluate various LLMs in tackling specific fact-checking subtasks.
arXiv Detail & Related papers (2023-11-29T05:04:52Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Revisiting Distance Metric Learning for Few-Shot Natural Language
Classification [1.0323063834827415]
Under few-shot learning settings, particularly proxy-based DML losses can positively affect the fine-tuning and inference of a supervised language model.
Models tuned with a combination of CCE and ProxyAnchor Loss have, on average, the best performance and outperform models with only CCE by about 3.27 percentage points.
arXiv Detail & Related papers (2022-11-28T10:19:31Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.