Iterative Forward Tuning Boosts In-Context Learning in Language Models
- URL: http://arxiv.org/abs/2305.13016v3
- Date: Tue, 4 Jun 2024 05:49:01 GMT
- Title: Iterative Forward Tuning Boosts In-Context Learning in Language Models
- Authors: Jiaxi Yang, Binyuan Hui, Min Yang, Bailin Wang, Bowen Li, Binhua Li, Fei Huang, Yongbin Li,
- Abstract summary: In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
- Score: 88.25013390669845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations involving demonstrations, a practice aligning more closely with the iterative decision-making process exhibited by humans, who often learn through analogy. In this study, we introduce a novel two-stage framework to boost ICL in LLMs. Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation. This mechanism operates by manipulating the Key-Value matrices without training, fostering enhanced understanding capabilities in LLMs by thinking demonstrations multiple times. We evaluated Deep-Thinking across a range of benchmarks and LLMs, showing its superior performance over vanilla ICL methods and its effectiveness in challenging tasks where demonstration selection is infeasible.
Related papers
- Focused Large Language Models are Stable Many-Shot Learners [18.783939647966776]
In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations.
We propose a training-free method FocusICL, which conducts triviality filtering to avoid attention being diverted by unimportant contents.
We show that FocusICL achieves an average performance improvement of 5.2% over vanilla ICL and scales well with many-shot demonstrations.
arXiv Detail & Related papers (2024-08-26T02:53:24Z) - Large Language Models Know What Makes Exemplary Contexts [42.90814615222177]
In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs)
This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts.
arXiv Detail & Related papers (2024-08-14T12:32:41Z) - From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning [47.82447085244952]
We show that modalities matter differently across tasks in multimodal ICL.
Guided by task-specific modality impact, we recommend modality-driven demonstration strategies to boost ICL performance.
arXiv Detail & Related papers (2024-07-01T01:57:21Z) - Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning [43.356895599336504]
We analyze the working mechanisms of the learning-based demonstration selection methods.
We empirically identify two important factors related to similarity measurement.
We introduce two effective yet simplified exemplar selection methods catering to task-agnostic and task-specific demands.
arXiv Detail & Related papers (2024-06-14T03:34:02Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning [9.660673938961416]
Demonstration ordering is an important strategy for in-context learning (ICL)
We propose a simple but effective demonstration ordering method for ICL, named the few-shot In-Context Curriculum Learning (ICCL)
arXiv Detail & Related papers (2024-02-16T14:55:33Z) - Understanding and Improving In-Context Learning on Vision-language
Models [42.7212469140844]
In-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can be applied to vision-language models (VLMs)
This study investigates the significance of both visual and language information.
We propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES)
arXiv Detail & Related papers (2023-11-29T19:08:11Z) - Dynamic Demonstrations Controller for In-Context Learning [51.3439660534631]
In-Context Learning (ICL) is a new paradigm for natural language processing (NLP), where a large language model observes a small number of demonstrations and a test instance as its input.
Previous studies have revealed that ICL is sensitive to the selection and the ordering of demonstrations.
We propose a Dynamic Demonstrations Controller (D$2$Controller), which can improve the ICL performance by adjusting the number of demonstrations.
arXiv Detail & Related papers (2023-09-30T14:04:22Z) - Scaling In-Context Demonstrations with Structured Attention [75.41845145597875]
We propose a better architectural design for in-context learning.
Structured Attention for In-Context Learning replaces the full-attention by a structured attention mechanism.
We show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up.
arXiv Detail & Related papers (2023-07-05T23:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.