In-context Vectors: Making In Context Learning More Effective and
Controllable Through Latent Space Steering
- URL: http://arxiv.org/abs/2311.06668v3
- Date: Tue, 13 Feb 2024 22:37:39 GMT
- Title: In-context Vectors: Making In Context Learning More Effective and
Controllable Through Latent Space Steering
- Authors: Sheng Liu, Haotian Ye, Lei Xing, James Zou
- Abstract summary: Large language models (LLMs) demonstrate emergent in-context learning capabilities.
We propose an alternative approach that recasts in-context learning as in-context vectors (ICV)
ICV achieves better performance compared to standard in-context learning.
- Score: 37.334374583093165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) demonstrate emergent in-context learning
capabilities, where they adapt to new tasks based on example demonstrations.
However, in-context learning has seen limited effectiveness in many settings,
is difficult to quantitatively control and takes up context window space. To
overcome these limitations, we propose an alternative approach that recasts
in-context learning as in-context vectors (ICV). Using ICV has two steps. We
first use a forward pass on demonstration examples to create the in-context
vector from the latent embedding of the LLM. This vector captures essential
information about the intended task. On a new query, instead of adding
demonstrations to the prompt, we shift the latent states of the LLM using the
ICV. The ICV approach has several benefits: 1) it enables the LLM to more
effectively follow the demonstration examples; 2) it's easy to control by
adjusting the magnitude of the ICV; 3) it reduces the length of the prompt by
removing the in-context demonstrations; 4) ICV is computationally much more
efficient than fine-tuning. We demonstrate that ICV achieves better performance
compared to standard in-context learning and fine-tuning on diverse tasks
including safety, style transfer, role-playing and formatting. Moreover, we
show that we can flexibly teach LLM to simultaneously follow different types of
instructions by simple vector arithmetics on the corresponding ICVs.
Related papers
- Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data.
We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders.
In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z) - LIVE: Learnable In-Context Vector for Visual Question Answering [37.89141789981324]
We develop Large Multimodal Models (LMMs) with In-Context Learning (ICL) capabilities.
Applying ICL usually faces two major challenges: 1) using more ICDs will largely increase the inference time and 2) the performance is sensitive to the selection of ICDs.
We propose Learn In-Context VEctor (LIVE) to distill task information from demonstrations, improving ICL performance in LMMs.
arXiv Detail & Related papers (2024-06-19T03:33:45Z) - Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries.
We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space.
I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z) - Towards Multimodal In-Context Learning for Vision & Language Models [21.69457980865084]
State-of-the-art Vision-Language Models (VLMs) ground the vision and the language modality.
We propose a simple yet surprisingly effective multi-turn curriculum-based learning methodology with effective data mixes.
arXiv Detail & Related papers (2024-03-19T13:53:37Z) - VILA: On Pre-training for Visual Language Models [74.08039416548209]
We study the design options for VLM pre-training through step-by-step controllable comparisons.
We build VILA, a Visual Language model family that consistently outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-12-12T18:58:18Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - What In-Context Learning "Learns" In-Context: Disentangling Task
Recognition and Task Learning [24.395288160951118]
Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations.
We characterize two ways through which ICL leverages demonstrations.
We show that models can achieve non-trivial performance with only TR, and TR does not further improve with larger models or more demonstrations.
arXiv Detail & Related papers (2023-05-16T18:05:19Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z) - Contrastive Visual-Linguistic Pretraining [48.88553854384866]
Contrastive Visual-Linguistic Pretraining constructs a visual self-supervised loss built upon contrastive learning.
We evaluate it on several down-stream tasks, including VQA, GQA and NLVR2.
arXiv Detail & Related papers (2020-07-26T14:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.