Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning
- URL: http://arxiv.org/abs/2507.20906v2
- Date: Tue, 29 Jul 2025 02:15:56 GMT
- Title: Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning
- Authors: Jungwon Park, Wonjong Rhee,
- Abstract summary: In-Context Learning (ICL) enables Large Language Models to perform tasks by conditioning on input-output examples in the prompt.<n>In this work, we propose Soft Injection of task embeddings.<n>Soft injection is performed by softly mixing task embeddings with attention head activations.
- Score: 5.778024594615575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-Context Learning (ICL) enables Large Language Models (LLMs) to perform tasks by conditioning on input-output examples in the prompt, without requiring any update in model parameters. While widely adopted, it remains unclear whether prompting with multiple examples is the most effective and efficient way to convey task information. In this work, we propose Soft Injection of task embeddings. The task embeddings are constructed only once using few-shot ICL prompts and repeatedly used during inference. Soft injection is performed by softly mixing task embeddings with attention head activations using pre-optimized mixing parameters, referred to as soft head-selection parameters. This method not only allows a desired task to be performed without in-prompt demonstrations but also significantly outperforms existing ICL approaches while reducing memory usage and compute cost at inference time. An extensive evaluation is performed across 57 tasks and 12 LLMs, spanning four model families of sizes from 4B to 70B. Averaged across 57 tasks, our method outperforms 10-shot ICL by 10.2%-14.3% across 12 LLMs. Additional analyses show that our method also serves as an insightful tool for analyzing task-relevant roles of attention heads, revealing that task-relevant head positions selected by our method transfer across similar tasks but not across dissimilar ones -- underscoring the task-specific nature of head functionality. Our soft injection method opens a new paradigm for reducing prompt length and improving task performance by shifting task conditioning from the prompt space to the activation space.
Related papers
- From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning [55.90498988440303]
In-context learning (ICL) enables large language models to perform novel tasks without parameter updates.<n>We propose a cost-efficient two-stage pipeline that reduces reliance on language models for data labeling.<n> Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.
arXiv Detail & Related papers (2025-10-28T15:37:51Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.<n>We propose a novel approach to automatically extract the subset of the LLM that properly performs a targeted task.<n>We show that the resulting models are considerably smaller, reducing the number of parameters up to 82.77% and (ii) more interpretable.
arXiv Detail & Related papers (2024-12-20T10:11:44Z) - EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning [5.172620636569522]
Large language models (LLMs) have enabled in-context learning (ICL), allowing LLMs to acquire proficiency in a specific task using only a few demonstration samples (exemplars)
A critical challenge in ICL is the selection of optimal exemplars, which can be either task-specific (static) or test-example-specific (dynamic)
arXiv Detail & Related papers (2024-11-06T12:48:04Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.<n>We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.<n> Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance [11.595274304409937]
Large language models (LLMs) have revolutionized zero-shot task performance.
Current methods using trigger phrases such as "Let's think step by step" remain limited.
This study introduces PRomPTed, an approach that optimize the zero-shot prompts for individual task instances.
arXiv Detail & Related papers (2023-10-03T14:51:34Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient
Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach.
It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged.
It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.