Small Models are Valuable Plug-ins for Large Language Models
- URL: http://arxiv.org/abs/2305.08848v1
- Date: Mon, 15 May 2023 17:59:01 GMT
- Title: Small Models are Valuable Plug-ins for Large Language Models
- Authors: Canwen Xu and Yichong Xu and Shuohang Wang and Yang Liu and Chenguang
Zhu and Julian McAuley
- Abstract summary: Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable.
We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
- Score: 65.29370906766997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their
weights are often publicly unavailable and their immense sizes make the models
difficult to be tuned with common hardware. As a result, effectively tuning
these models with large-scale supervised data can be challenging. As an
alternative, In-Context Learning (ICL) can only use a small number of
supervised examples due to context length limits. In this paper, we propose
Super In-Context Learning (SuperICL) which allows black-box LLMs to work with
locally fine-tuned smaller models, resulting in superior performance on
supervised tasks. Our experiments demonstrate that SuperICL can improve
performance beyond state-of-the-art fine-tuned models while addressing the
instability problem of in-context learning. Furthermore, SuperICL can enhance
the capabilities of smaller models, such as multilinguality and
interpretability.
Related papers
- Improving Large Models with Small models: Lower Costs and Better Performance [81.55672406002715]
We propose Data Shunt$+$ (DS$+$), a general paradigm for collaboration of small and large models.
For instance, ChatGPT achieves an accuracy of $94.43%$ on Amazon Product sentiment analysis, and DS$+$ achieves an accuracy of $95.64%$, while the cost has been reduced to only $31.18%$.
arXiv Detail & Related papers (2024-06-15T14:44:43Z) - Grimoire is All You Need for Enhancing Large Language Models [13.111331915718527]
We propose a method SLEICL that involves learning from examples using strong language models and then summarizing and transferring these learned skills to weak language models for inference and application.
Our experiments, conducted on up to eight datasets with five language models, demonstrate that weak language models achieve consistent improvement over their own zero-shot or few-shot capabilities using the SLEICL method.
arXiv Detail & Related papers (2024-01-07T04:32:29Z) - Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks [5.536630285985836]
We introduce parameter-efficient sparsity crafting (PESC)
PESC crafts dense models into sparse models using the mixture-of-experts (MoE) architecture.
Our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GP3.5.
arXiv Detail & Related papers (2024-01-05T09:58:09Z) - Improving In-context Learning via Bidirectional Alignment [41.214003703218914]
Large language models (LLMs) have shown impressive few-shot generalization on many tasks via in-context learning (ICL)
We propose Bidirectional Alignment (BiAlign) to fully leverage the models' preferences for ICL examples to improve the ICL abilities of student models.
Specifically, we introduce the alignment of input preferences between student and teacher models by incorporating a novel ranking loss.
arXiv Detail & Related papers (2023-12-28T15:02:03Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Empower Your Model with Longer and Better Context Comprehension [15.377707808279908]
We investigate the nature of information transfer within Large Language Models (LLMs)
We propose a novel technique called Attention Transition to empower models to achieve longer and better context comprehension.
Our experiments are conducted on the challenging XSum dataset using LLaMa-7b model with context token length ranging from 800 to 1900.
arXiv Detail & Related papers (2023-07-25T09:34:42Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Emergent Abilities of Large Language Models [172.08007363384218]
We consider an ability to be emergent if it is not present in smaller models but is present in larger models.
The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
arXiv Detail & Related papers (2022-06-15T17:32:01Z) - Efficient Large Scale Language Modeling with Mixtures of Experts [61.45159383372181]
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation.
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings.
arXiv Detail & Related papers (2021-12-20T17:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.