In-Context Learning with Many Demonstration Examples
- URL: http://arxiv.org/abs/2302.04931v1
- Date: Thu, 9 Feb 2023 20:53:12 GMT
- Title: In-Context Learning with Many Demonstration Examples
- Authors: Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong
Wu, Lingpeng Kong
- Abstract summary: We propose a long-range language model EVALM based on an efficient transformer mechanism.
EVALM is trained with 8k tokens per batch line and can test up to 256k-lengthed contexts.
We find that in-context learning can achieve higher performance with more demonstrations under many-shot instruction tuning.
- Score: 26.39178386828271
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-training language models (PLMs) have shown promising in-context
learning abilities. However, due to the backbone transformer architecture,
existing PLMs are bottlenecked by the memory and computational cost when
scaling up to a large context size, leaving instruction tuning and in-context
learning of many demonstration examples, as well as long-range language
modeling under-explored. In this study, we propose a long-range language model
EVALM based on an efficient transformer mechanism. EVALM is trained with 8k
tokens per batch line and can test up to 256k-lengthed contexts with
extrapolation, 128 times to the limit of existing PLMs (e.g. GPT3). Based on
EVALM, we scale up the size of examples efficiently in both instruction tuning
and in-context learning to explore the boundary of the benefits from more
annotated data. Experimental results on a diverse set of tasks show that EVALM
achieves 4.1% higher accuracy on average, and the average length of achieving
the best accuracy score over tasks is around 12k. We find that in-context
learning can achieve higher performance with more demonstrations under
many-shot instruction tuning (8k), and further extending the length of
instructions (16k) can further improve the upper bound of scaling in-context
learning.
Related papers
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models [11.77848664657788]
We show that instruction tuning is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities.
We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task-specfic features.
We use our novel data synthesis method, DELIA, to transform biased features in instruction tuning into approximations of ideal features.
arXiv Detail & Related papers (2024-08-19T17:56:06Z) - Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning [68.43706033424378]
This study introduces an innovative method designed to increase in-context text length in large language models (MLLMs) efficiently.
We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens.
This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage.
arXiv Detail & Related papers (2024-06-04T17:59:25Z) - Towards Robust Instruction Tuning on Multimodal Large Language Models [25.506776502317436]
In this work, we introduce an automatic instruction augmentation method named INSTRAUG in multimodal tasks.
Results on two popular multimodal instructionfollowing benchmarks show that INSTRAUG can significantly improve the alignment of multimodal large language models (MLLMs) across 12 multimodal tasks.
arXiv Detail & Related papers (2024-02-22T12:35:50Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.