Improving In-Context Learning with Small Language Model Ensembles
- URL: http://arxiv.org/abs/2410.21868v1
- Date: Tue, 29 Oct 2024 09:02:37 GMT
- Title: Improving In-Context Learning with Small Language Model Ensembles
- Authors: M. Mehdi Mojarradi, Lingyi Yang, Robert McCraith, Adam Mahdi,
- Abstract summary: In-context learning (ICL) is a cheap and efficient alternative but cannot match the accuracies of advanced methods.
We present Ensemble SuperICL, a novel approach that enhances ICL by leveraging the expertise of multiple fine-tuned small language models (SLMs)
- Score: 2.3499129784547654
- License:
- Abstract: Large language models (LLMs) have shown impressive capabilities across various tasks, but their performance on domain-specific tasks remains limited. While methods like retrieval augmented generation and fine-tuning can help to address this, they require significant resources. In-context learning (ICL) is a cheap and efficient alternative but cannot match the accuracies of advanced methods. We present Ensemble SuperICL, a novel approach that enhances ICL by leveraging the expertise of multiple fine-tuned small language models (SLMs). Ensemble SuperICL achieves state of the art (SoTA) results on several natural language understanding benchmarks. Additionally, we test it on a medical-domain labelling task and showcase its practicality by using off-the-shelf SLMs fine-tuned on a general language task, achieving superior accuracy in large-scale data labelling compared to all baselines. Finally, we conduct an ablation study and sensitivity analyses to elucidate the underlying mechanism of Ensemble SuperICL. Our research contributes to the growing demand for efficient domain specialisation methods in LLMs, offering a cheap and effective method for practitioners.
Related papers
- A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Benchmarking General-Purpose In-Context Learning [19.40952728849431]
In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly.
In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential.
We introduce two benchmarks specifically crafted to train and evaluate GPICL functionalities.
arXiv Detail & Related papers (2024-05-27T14:50:42Z) - Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning [41.606494950216764]
In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs)
This paper decomposes the overall performance of ICL into three dimensions, label space, format, and discrimination.
We show that ICL exhibits significant efficacy in regulating the label space and format, which helps LLMs respond to desired label words.
arXiv Detail & Related papers (2024-04-11T08:20:10Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking [21.799697177859898]
In this paper, we explore using an in-context learning method named RS-LLM (Rich Semantic based LLMs) to introduce large language models (LLMs) as the foundation model.
We found that by introducing a small number of specific Chinese rich semantic structures, LLMs achieve better performance than the BERT-based model on few-shot CSC task.
arXiv Detail & Related papers (2024-03-13T12:55:43Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.