A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
- URL: http://arxiv.org/abs/2307.13655v1
- Date: Tue, 25 Jul 2023 17:02:38 GMT
- Title: A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
- Authors: Xunjian Yin and Xiaojun Wan
- Abstract summary: We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
- Score: 53.152011258252315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of pre-trained models and the incorporation of phonetic
and graphic information, neural models have achieved high scores in Chinese
Spelling Check (CSC). However, it does not provide a comprehensive reflection
of the models' capability due to the limited test sets. In this study, we
abstract the representative model paradigm, implement it with nine structures
and experiment them on comprehensive test sets we constructed with different
purposes. We perform a detailed analysis of the results and find that: 1)
Fusing phonetic and graphic information reasonably is effective for CSC. 2)
Models are sensitive to the error distribution of the test set, which reflects
the shortcomings of models and reveals the direction we should work on. 3)
Whether or not the errors and contexts have been seen has a significant impact
on models. 4) The commonly used benchmark, SIGHAN, can not reliably evaluate
models' performance.
Related papers
- Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks [59.47851630504264]
Free-text explanations are expressive and easy to understand, but many datasets lack annotated explanation data.
We fine-tune T5-Large and OLMo-7B models and assess the impact of fine-tuning data quality, the number of fine-tuning samples, and few-shot selection methods.
The models are evaluated on 19 diverse OOD datasets across three tasks: natural language inference (NLI), fact-checking, and hallucination detection in abstractive summarization.
arXiv Detail & Related papers (2025-02-07T10:01:32Z) - Tests for model misspecification in simulation-based inference: from local distortions to global model checks [2.0209172586699173]
We provide a solid and flexible foundation for a wide range of model discrepancy analysis tasks.
We make explicit analytic connections to classical techniques: anomaly detection, model validation, and goodness-of-fit residual analysis.
We show how to conduct such a distortion-driven model misspecification test for real gravitational wave data, specifically on the event GW150914.
arXiv Detail & Related papers (2024-12-19T17:48:03Z) - The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks [15.569758991934934]
We investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task.
We find substantial differences in the feature embeddings at different layers of the models.
arXiv Detail & Related papers (2024-05-02T13:26:18Z) - Evaluating the Reliability of CNN Models on Classifying Traffic and Road
Signs using LIME [1.188383832081829]
The study focuses on evaluating the accuracy of these models' predictions as well as their ability to employ appropriate features for image categorization.
To gain insights into the strengths and limitations of the model's predictions, the study employs the local interpretable model-agnostic explanations (LIME) framework.
arXiv Detail & Related papers (2023-09-11T18:11:38Z) - Rethinking Masked Language Modeling for Chinese Spelling Correction [70.85829000570203]
We study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model.
We find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns.
We demonstrate that a very simple strategy, randomly masking 20% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model.
arXiv Detail & Related papers (2023-05-28T13:19:12Z) - An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets.
We investigated model capabilities, training data, and model interpretation.
Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z) - Generalizability of Machine Learning Models: Quantitative Evaluation of
Three Methodological Pitfalls [1.3870303451896246]
We implement random forest and deep convolutional neural network models using several medical imaging datasets.
We show that violation of the independence assumption could substantially affect model generalizability.
Inappropriate performance indicators could lead to erroneous conclusions.
arXiv Detail & Related papers (2022-02-01T05:07:27Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.