Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese
Spelling Correction
- URL: http://arxiv.org/abs/2311.08219v1
- Date: Tue, 14 Nov 2023 14:56:33 GMT
- Title: Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese
Spelling Correction
- Authors: Kunting Li, Yong Hu, Shaolei Wang, Hanhan Ma, Liang He, Fandong Meng,
Jie Zhou
- Abstract summary: ChatGPT has demonstrated impressive performance in various downstream tasks.
In the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics.
This paper proposes a new evaluation metric: Eval-GCSC. By incorporating word-level and semantic similarity judgments, it relaxes the stringent length and phonics constraints.
- Score: 60.32771192285546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ChatGPT has demonstrated impressive performance in various downstream tasks.
However, in the Chinese Spelling Correction (CSC) task, we observe a
discrepancy: while ChatGPT performs well under human evaluation, it scores
poorly according to traditional metrics. We believe this inconsistency arises
because the traditional metrics are not well-suited for evaluating generative
models. Their overly strict length and phonics constraints may lead to
underestimating ChatGPT's correction capabilities. To better evaluate
generative models in the CSC task, this paper proposes a new evaluation metric:
Eval-GCSC. By incorporating word-level and semantic similarity judgments, it
relaxes the stringent length and phonics constraints. Experimental results show
that Eval-GCSC closely aligns with human evaluations. Under this metric,
ChatGPT's performance is comparable to traditional token-level classification
models (TCM), demonstrating its potential as a CSC tool. The source code and
scripts can be accessed at https://github.com/ktlKTL/Eval-GCSC.
Related papers
- C-LLM: Learn to Check Chinese Spelling Errors Character by Character [61.53865964535705]
We propose C-LLM, a Large Language Model-based Chinese Spell Checking method that learns to check errors Character by Character.
C-LLM achieves an average improvement of 10% over existing methods.
arXiv Detail & Related papers (2024-06-24T11:16:31Z) - Revisiting Meta-evaluation for Grammatical Error Correction [14.822205658480813]
SEEDA is a new dataset for GEC meta-evaluation.
It consists of corrections with human ratings along two different granularities.
The results suggest that edit-based metrics may have been underestimated in existing studies.
arXiv Detail & Related papers (2024-03-05T05:53:09Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Is ChatGPT a Highly Fluent Grammatical Error Correction System? A
Comprehensive Evaluation [41.94480044074273]
ChatGPT is a large-scale language model based on the advanced GPT-3.5 architecture.
We design zero-shot chain-of-thought (CoT) and few-shot CoT settings using in-context learning for ChatGPT.
Our evaluation involves assessing ChatGPT's performance on five official test sets in three different languages, along with three document-level GEC test sets in English.
arXiv Detail & Related papers (2023-04-04T12:33:40Z) - ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction
Benchmark [11.36853733574956]
ChatGPT is a cutting-edge artificial intelligence language model developed by OpenAI.
We compare it with commercial GEC product (e.g., Grammarly) and state-of-the-art models (e.g., GECToR)
We find that ChatGPT performs not as well as those baselines in terms of the automatic evaluation metrics.
arXiv Detail & Related papers (2023-03-15T00:35:50Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - Improving Chinese Spelling Check by Character Pronunciation Prediction:
The Effects of Adaptivity and Granularity [76.20568599642799]
Chinese spelling check (CSC) is a fundamental NLP task that detects and corrects spelling errors in Chinese texts.
In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction ( CPP) to improve CSC.
We propose SCOPE which builds on top of a shared encoder two parallel decoders, one for the primary CSC task and the other for a fine-grained auxiliary CPP task.
arXiv Detail & Related papers (2022-10-20T03:42:35Z) - Reducing Spelling Inconsistencies in Code-Switching ASR using
Contextualized CTC Loss [5.707652271634435]
We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies.
CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path.
Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.
arXiv Detail & Related papers (2020-05-16T09:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.