Structured Prompting: Scaling In-Context Learning to 1,000 Examples
- URL: http://arxiv.org/abs/2212.06713v1
- Date: Tue, 13 Dec 2022 16:31:21 GMT
- Title: Structured Prompting: Scaling In-Context Learning to 1,000 Examples
- Authors: Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
- Abstract summary: We introduce structured prompting that breaks the length limit and scales in-context learning to thousands of examples.
Specifically, demonstration examples are separately encoded with well-designed position embeddings, and then they are jointly attended by the test example using a rescaled attention mechanism.
- Score: 78.41281805608081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have exhibited intriguing in-context learning
capability, achieving promising zero- and few-shot performance without updating
the parameters. However, conventional in-context learning is usually restricted
by length constraints, rendering it ineffective to absorb supervision from a
large number of examples. In order to go beyond few shots, we introduce
structured prompting that breaks the length limit and scales in-context
learning to thousands of examples. Specifically, demonstration examples are
separately encoded with well-designed position embeddings, and then they are
jointly attended by the test example using a rescaled attention mechanism. So
we can scale the number of exemplars with linear complexity instead of
quadratic complexity with respect to length. Experimental results on a diverse
set of tasks show that our approach improves end-task performance and reduces
evaluation variance over conventional in-context learning as the number of
demonstration examples increases. Code has been released at
https://aka.ms/structured-prompting.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - In-Context Learning with Long-Context Models: An In-Depth Exploration [96.1389740719691]
We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations.
We show that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples.
arXiv Detail & Related papers (2024-04-30T21:06:52Z) - Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL)
We observe significant performance gains across a wide variety of generative and discriminative tasks.
Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z) - EXnet: Efficient In-context Learning for Data-less Text classification [0.0]
We present EXnet, a model specifically designed to perform in-context learning without limitations on the number of examples.
We argue that in-context learning is an effective method to increase task accuracy, and providing examples facilitates cross-task generalization.
With extensive experiments, we show that even our smallest model (15M parameters) generalizes to several unseen classification tasks and domains.
arXiv Detail & Related papers (2023-05-24T01:40:57Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - In-Context Probing: Toward Building Robust Classifiers via Probing Large
Language Models [5.5089506884366735]
In this paper, we propose an alternative approach, which we term In-Context Probing (ICP)
Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label.
We show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models.
arXiv Detail & Related papers (2023-05-23T15:43:04Z) - ScatterShot: Interactive In-context Example Curation for Text
Transformation [44.9405895390925]
We present ScatterShot, an interactive system for building high-quality demonstration sets for in-context learning.
ScatterShot iteratively slices unlabeled data into task-specific patterns, samples informative inputs from underexplored or not-yet-saturated slices in an active learning manner.
In a user study, ScatterShot greatly helps users in covering different patterns in the input space and labeling in-context examples more efficiently.
arXiv Detail & Related papers (2023-02-14T21:13:31Z) - Reordering Examples Helps during Priming-based Few-Shot Learning [6.579039107070663]
We show that PERO can learn to generalize efficiently using as few as 10 examples.
We demonstrate the effectiveness of the proposed method on the tasks of sentiment classification, natural language inference and fact retrieval.
arXiv Detail & Related papers (2021-06-03T11:02:36Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.