Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
- URL: http://arxiv.org/abs/2403.08492v3
- Date: Wed, 20 Nov 2024 04:00:39 GMT
- Title: Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
- Authors: Ming Dong, Yujing Chen, Miao Zhang, Hao Sun, Tingting He,
- Abstract summary: In this paper, we explore using an in-context learning method named RS-LLM (Rich Semantic based LLMs) to introduce large language models (LLMs) as the foundation model.
We found that by introducing a small number of specific Chinese rich semantic structures, LLMs achieve better performance than the BERT-based model on few-shot CSC task.
- Score: 21.799697177859898
- License:
- Abstract: Chinese Spell Checking (CSC) is a widely used technology, which plays a vital role in speech to text (STT) and optical character recognition (OCR). Most of the existing CSC approaches relying on BERT architecture achieve excellent performance. However, limited by the scale of the foundation model, BERT-based method does not work well in few-shot scenarios, showing certain limitations in practical applications. In this paper, we explore using an in-context learning method named RS-LLM (Rich Semantic based LLMs) to introduce large language models (LLMs) as the foundation model. Besides, we study the impact of introducing various Chinese rich semantic information in our framework. We found that by introducing a small number of specific Chinese rich semantic structures, LLMs achieve better performance than the BERT-based model on few-shot CSC task. Furthermore, we conduct experiments on multiple datasets, and the experimental results verified the superiority of our proposed framework.
Related papers
- Improving In-Context Learning with Small Language Model Ensembles [2.3499129784547654]
In-context learning (ICL) is a cheap and efficient alternative but cannot match the accuracies of advanced methods.
We present Ensemble SuperICL, a novel approach that enhances ICL by leveraging the expertise of multiple fine-tuned small language models (SLMs)
arXiv Detail & Related papers (2024-10-29T09:02:37Z) - A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Making Text Embedders Few-Shot Learners [33.50993377494602]
We introduce a novel model bge-en-icl, which employs few-shot examples to produce high-quality text embeddings.
Our approach integrates task-related examples directly into the query side, resulting in significant improvements across various tasks.
Experimental results on the MTEB and AIR-Bench benchmarks demonstrate that our approach sets new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-09-24T03:30:19Z) - Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark.
We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z) - MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task [3.1459398432526267]
This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with Knowledge Tracing methods.
LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets.
arXiv Detail & Related papers (2024-06-05T03:26:59Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - LICHEE: Improving Language Model Pre-training with Multi-grained
Tokenization [19.89228774074371]
We propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text.
Our method can be applied to various pre-trained language models and improve their representation capability.
arXiv Detail & Related papers (2021-08-02T12:08:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.