A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
- URL: http://arxiv.org/abs/2307.13655v1
- Date: Tue, 25 Jul 2023 17:02:38 GMT
- Title: A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
- Authors: Xunjian Yin and Xiaojun Wan
- Abstract summary: We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
- Score: 53.152011258252315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of pre-trained models and the incorporation of phonetic
and graphic information, neural models have achieved high scores in Chinese
Spelling Check (CSC). However, it does not provide a comprehensive reflection
of the models' capability due to the limited test sets. In this study, we
abstract the representative model paradigm, implement it with nine structures
and experiment them on comprehensive test sets we constructed with different
purposes. We perform a detailed analysis of the results and find that: 1)
Fusing phonetic and graphic information reasonably is effective for CSC. 2)
Models are sensitive to the error distribution of the test set, which reflects
the shortcomings of models and reveals the direction we should work on. 3)
Whether or not the errors and contexts have been seen has a significant impact
on models. 4) The commonly used benchmark, SIGHAN, can not reliably evaluate
models' performance.
Related papers
- The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks [15.569758991934934]
We investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task.
We find substantial differences in the feature embeddings at different layers of the models.
arXiv Detail & Related papers (2024-05-02T13:26:18Z) - Evaluating the Reliability of CNN Models on Classifying Traffic and Road
Signs using LIME [1.188383832081829]
The study focuses on evaluating the accuracy of these models' predictions as well as their ability to employ appropriate features for image categorization.
To gain insights into the strengths and limitations of the model's predictions, the study employs the local interpretable model-agnostic explanations (LIME) framework.
arXiv Detail & Related papers (2023-09-11T18:11:38Z) - Rethinking Masked Language Modeling for Chinese Spelling Correction [70.85829000570203]
We study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model.
We find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns.
We demonstrate that a very simple strategy, randomly masking 20% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model.
arXiv Detail & Related papers (2023-05-28T13:19:12Z) - An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets.
We investigated model capabilities, training data, and model interpretation.
Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Generalizability of Machine Learning Models: Quantitative Evaluation of
Three Methodological Pitfalls [1.3870303451896246]
We implement random forest and deep convolutional neural network models using several medical imaging datasets.
We show that violation of the independence assumption could substantially affect model generalizability.
Inappropriate performance indicators could lead to erroneous conclusions.
arXiv Detail & Related papers (2022-02-01T05:07:27Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z) - Comparative Study of Language Models on Cross-Domain Data with Model
Agnostic Explainability [0.0]
The study compares the state-of-the-art language models - BERT, ELECTRA and its derivatives which include RoBERTa, ALBERT and DistilBERT.
The experimental results establish new state-of-the-art for 2013 rating classification task and Financial Phrasebank sentiment detection task with 69% accuracy and 88.2% accuracy respectively.
arXiv Detail & Related papers (2020-09-09T04:31:44Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.