Scaling In-Context Demonstrations with Structured Attention
- URL: http://arxiv.org/abs/2307.02690v1
- Date: Wed, 5 Jul 2023 23:26:01 GMT
- Title: Scaling In-Context Demonstrations with Structured Attention
- Authors: Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang
- Abstract summary: We propose a better architectural design for in-context learning.
Structured Attention for In-Context Learning replaces the full-attention by a structured attention mechanism.
We show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up.
- Score: 75.41845145597875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent surge of large language models (LLMs) highlights their ability to
perform in-context learning, i.e., "learning" to perform a task from a few
demonstrations in the context without any parameter updates. However, their
capabilities of in-context learning are limited by the model architecture: 1)
the use of demonstrations is constrained by a maximum sentence length due to
positional embeddings; 2) the quadratic complexity of attention hinders users
from using more demonstrations efficiently; 3) LLMs are shown to be sensitive
to the order of the demonstrations. In this work, we tackle these challenges by
proposing a better architectural design for in-context learning. We propose
SAICL (Structured Attention for In-Context Learning), which replaces the
full-attention by a structured attention mechanism designed for in-context
learning, and removes unnecessary dependencies between individual
demonstrations, while making the model invariant to the permutation of
demonstrations. We evaluate SAICL in a meta-training framework and show that
SAICL achieves comparable or better performance than full attention while
obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a
strong Fusion-in-Decoder (FiD) baseline which processes each demonstration
independently. Finally, thanks to its linear nature, we demonstrate that SAICL
can easily scale to hundreds of demonstrations with continuous performance
gains with scaling.
Related papers
- DemoShapley: Valuation of Demonstrations for In-Context Learning [20.26604061802236]
Large language models (LLMs) leveraging in-context learning (ICL) have set new benchmarks in few-shot learning across various tasks without needing task-specific fine-tuning.
We introduce DemoShapley which is inspired by the Data Shapley valuation theorem.
Our findings reveal that DemoShapley not only enhances model performance in terms of accuracy and fairness but also generalizes queries from domains distinct from those of the in-context demonstrations.
arXiv Detail & Related papers (2024-10-10T01:35:03Z) - Focused Large Language Models are Stable Many-Shot Learners [18.783939647966776]
In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations.
We propose a training-free method FocusICL, which conducts triviality filtering to avoid attention being diverted by unimportant contents.
We show that FocusICL achieves an average performance improvement of 5.2% over vanilla ICL and scales well with many-shot demonstrations.
arXiv Detail & Related papers (2024-08-26T02:53:24Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - Are Human-generated Demonstrations Necessary for In-context Learning? [22.783456038837794]
Self-contemplation prompting strategy (SEC) is a paradigm free from human-crafted demonstrations.
Extensive experiments in arithmetic reasoning, commonsense reasoning, multi-task language understanding, and code generation benchmarks, show that SEC significantly outperforms the zero-shot learning strategy.
arXiv Detail & Related papers (2023-09-26T05:10:08Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - What In-Context Learning "Learns" In-Context: Disentangling Task
Recognition and Task Learning [24.395288160951118]
Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations.
We characterize two ways through which ICL leverages demonstrations.
We show that models can achieve non-trivial performance with only TR, and TR does not further improve with larger models or more demonstrations.
arXiv Detail & Related papers (2023-05-16T18:05:19Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Rethinking the Role of Demonstrations: What Makes In-Context Learning
Work? [112.72413411257662]
Large language models (LMs) are able to in-context learn by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
We show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance.
We find that other aspects of the demonstrations are the key drivers of end task performance.
arXiv Detail & Related papers (2022-02-25T17:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.