Efficient Prompting via Dynamic In-Context Learning
- URL: http://arxiv.org/abs/2305.11170v1
- Date: Thu, 18 May 2023 17:58:31 GMT
- Title: Efficient Prompting via Dynamic In-Context Learning
- Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya
Sachan
- Abstract summary: We propose DynaICL, a recipe for efficient prompting with black-box generalist models.
DynaICL dynamically allocates in-context examples according to the input complexity and the computational budget.
We find that DynaICL saves up to 46% token budget compared to the common practice that allocates the same number of in-context examples to each input.
- Score: 76.83516913735072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primary way of building AI applications is shifting from training
specialist models to prompting generalist models. A common practice for
prompting generalist models, often referred to as in-context learning, is to
append a few examples (demonstrations) to the prompt to help the model better
understand the task. While effective, in-context learning can be inefficient
because it makes the input prompt much longer, consuming valuable space in the
context window and leading to larger computational costs. In this paper, we
propose DynaICL, a recipe for efficient prompting with black-box generalist
models that dynamically allocate in-context examples according to the input
complexity and the computational budget. To achieve this, we train a meta
controller that predicts the number of in-context examples suitable for the
generalist model to make a good prediction based on the performance-efficiency
trade-off for a specific input. We then dynamically allocate the number of
demonstrations for an input according to predictions from the meta controller
and the given computation budget. Experimental results show that dynamic
example allocation helps achieve a better performance-efficiency trade-off in
two practical settings where computational resources or the required
performance is constrained. Specifically, DynaICL saves up to 46% token budget
compared to the common practice that allocates the same number of in-context
examples to each input. We also find that a meta controller trained on a
certain backbone model and tasks can successfully generalize to unseen models
and tasks.
Related papers
- Learning to Reduce: Optimal Representations of Structured Data in
Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents.
We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context.
We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z) - Unlearnable Algorithms for In-context Learning [36.895152458323764]
In this paper, we focus on efficient unlearning methods for the task adaptation phase of a pretrained large language model.
We observe that an LLM's ability to do in-context learning for task adaptation allows for efficient exact unlearning of task adaptation training data.
We propose a new holistic measure of unlearning cost which accounts for varying inference costs.
arXiv Detail & Related papers (2024-02-01T16:43:04Z) - Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers [54.83459025465947]
Even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and temporal reasoning, and counting.
Visual reasoning with large language models (LLMs) as controllers can, in principle, address these limitations by decomposing the task and solving subtasks by orchestrating a set of (visual) tools.
We present a framework that mitigates these issues by introducing spatially and temporally abstract routines and by leveraging a small number of labeled examples to automatically generate in-context examples.
arXiv Detail & Related papers (2024-01-03T20:48:47Z) - GistScore: Learning Better Representations for In-Context Example
Selection with Gist Bottlenecks [3.9638110494107095]
In-context Learning (ICL) is the ability of Large Language Models (LLMs) to perform new tasks when conditioned on prompts.
We propose Example Gisting, a novel approach for training example encoders through supervised fine-tuning.
We show that our fine-tuned models get state-of-the-art ICL performance with over 20% absolute gain over off-the-shelf retrievers.
arXiv Detail & Related papers (2023-11-16T06:28:05Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.