Efficient Prompting via Dynamic In-Context Learning
- URL: http://arxiv.org/abs/2305.11170v1
- Date: Thu, 18 May 2023 17:58:31 GMT
- Title: Efficient Prompting via Dynamic In-Context Learning
- Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya
Sachan
- Abstract summary: We propose DynaICL, a recipe for efficient prompting with black-box generalist models.
DynaICL dynamically allocates in-context examples according to the input complexity and the computational budget.
We find that DynaICL saves up to 46% token budget compared to the common practice that allocates the same number of in-context examples to each input.
- Score: 76.83516913735072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primary way of building AI applications is shifting from training
specialist models to prompting generalist models. A common practice for
prompting generalist models, often referred to as in-context learning, is to
append a few examples (demonstrations) to the prompt to help the model better
understand the task. While effective, in-context learning can be inefficient
because it makes the input prompt much longer, consuming valuable space in the
context window and leading to larger computational costs. In this paper, we
propose DynaICL, a recipe for efficient prompting with black-box generalist
models that dynamically allocate in-context examples according to the input
complexity and the computational budget. To achieve this, we train a meta
controller that predicts the number of in-context examples suitable for the
generalist model to make a good prediction based on the performance-efficiency
trade-off for a specific input. We then dynamically allocate the number of
demonstrations for an input according to predictions from the meta controller
and the given computation budget. Experimental results show that dynamic
example allocation helps achieve a better performance-efficiency trade-off in
two practical settings where computational resources or the required
performance is constrained. Specifically, DynaICL saves up to 46% token budget
compared to the common practice that allocates the same number of in-context
examples to each input. We also find that a meta controller trained on a
certain backbone model and tasks can successfully generalize to unseen models
and tasks.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Rational Metareasoning for Large Language Models [5.5539136805232205]
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs)
This work introduces a novel approach based on computational models of metareasoning used in cognitive science.
We develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning.
arXiv Detail & Related papers (2024-10-07T23:48:52Z) - Learning to Reduce: Optimal Representations of Structured Data in
Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents.
We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context.
We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers [54.83459025465947]
Even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and temporal reasoning, and counting.
Visual reasoning with large language models (LLMs) as controllers can, in principle, address these limitations by decomposing the task and solving subtasks by orchestrating a set of (visual) tools.
We present a framework that mitigates these issues by introducing spatially and temporally abstract routines and by leveraging a small number of labeled examples to automatically generate in-context examples.
arXiv Detail & Related papers (2024-01-03T20:48:47Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.