Mixture of Small and Large Models for Chinese Spelling Check
- URL: http://arxiv.org/abs/2506.06887v1
- Date: Sat, 07 Jun 2025 18:29:10 GMT
- Title: Mixture of Small and Large Models for Chinese Spelling Check
- Authors: Ziheng Qiao, Houquan Zhou, Zhenghua Li,
- Abstract summary: In the era of large language models (LLMs), the Chinese Spelling Check (CSC) task has seen various LLM methods developed.<n>Fine-tuned BERT-based models, relying on high-quality in-domain data, show excellent performance but suffer from edit pattern overfitting.<n>This paper proposes a novel dynamic mixture approach that effectively combines the probability distributions of small models and LLMs during the beam search decoding phase.
- Score: 10.634101727583127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of large language models (LLMs), the Chinese Spelling Check (CSC) task has seen various LLM methods developed, yet their performance remains unsatisfactory. In contrast, fine-tuned BERT-based models, relying on high-quality in-domain data, show excellent performance but suffer from edit pattern overfitting. This paper proposes a novel dynamic mixture approach that effectively combines the probability distributions of small models and LLMs during the beam search decoding phase, achieving a balanced enhancement of precise corrections from small models and the fluency of LLMs. This approach also eliminates the need for fine-tuning LLMs, saving significant time and resources, and facilitating domain adaptation. Comprehensive experiments demonstrate that our mixture approach significantly boosts error correction capabilities, achieving state-of-the-art results across multiple datasets. Our code is available at https://github.com/zhqiao-nlp/MSLLM.
Related papers
- EULER: Enhancing the Reasoning Ability of Large Language Models through Error-Induced Learning [66.82956219777763]
Large Language Models (LLMs) have demonstrated strong reasoning capabilities.<n>Error-IndUced LEaRning (EULER) model aims to develop an error exposure model that generates high-quality solution errors.
arXiv Detail & Related papers (2025-05-28T08:57:03Z) - Test-Time Learning for Large Language Models [33.11605667376906]
We propose a Test-Time Learning (TTL) paradigm for Large Language Models (LLMs)<n>LLMs dynamically adapts to target domains using only unlabeled test data during testing.<n>We demonstrate through experiments that TLM improves performance by at least 20% compared to original LLMs on domain knowledge adaptation.
arXiv Detail & Related papers (2025-05-27T02:18:59Z) - Large Language Models are Few-shot Multivariate Time Series Classifiers [23.045734479292356]
Large Language Models (LLMs) have been extensively applied in time series analysis.<n>Yet, their utility in the few-shot classification (i.e., a crucial training scenario) is underexplored.<n>We aim to leverage the extensive pre-trained knowledge in LLMs to overcome the data scarcity problem.
arXiv Detail & Related papers (2025-01-30T03:59:59Z) - Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z) - LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition
and Adaptive Quantization [9.517540904818986]
This paper proposes adaptive model quantization and phase-aware partition to improve LLM serving efficiency on heterogeneous GPU clusters.
Experiments on production inference workloads in 11 different clusters demonstrate that LLM-PQ achieves up to 2.88x (2.26x on average) throughput improvement in inference.
arXiv Detail & Related papers (2024-03-02T08:40:07Z) - Harnessing Large Language Models as Post-hoc Correctors [6.288056740658763]
We show that an LLM can work as a post-hoc corrector to propose corrections for the predictions of an arbitrary Machine Learning model.
We form a contextual knowledge database by incorporating the dataset's label information and the ML model's predictions on the validation dataset.
Our experimental results on text analysis and the challenging molecular predictions show that model improves the performance of a number of models by up to 39%.
arXiv Detail & Related papers (2024-02-20T22:50:41Z) - Certified Robustness for Large Language Models with Self-Denoising [42.916661225753145]
We propose to denoise the corrupted inputs with large language models (LLMs) in a self-denoising manner.
Our method outperforms the existing certification methods under both certified robustness and empirical robustness.
arXiv Detail & Related papers (2023-07-14T05:40:24Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.