Related papers: Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering

Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering

URL: http://arxiv.org/abs/2505.12025v1
Date: Sat, 17 May 2025 14:28:53 GMT
Title: Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering
Authors: Praveen Venkateswaran, Danish Contractor,
Abstract summary: We present an inference-time method that enables users to emphasize specific parts of their prompt by steering the model's attention toward them.<n>Unlike prior approaches, we dynamically update the proportion of model attention given to the user-specified parts--ensuring improved instruction following without performance degradation.
Score: 5.160554120418462
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many real-world applications, users rely on natural language instructions to guide large language models (LLMs) across a wide range of tasks. These instructions are often complex, diverse, and subject to frequent change. However, LLMs do not always attend to these instructions reliably, and users lack simple mechanisms to emphasize their importance beyond modifying prompt wording or structure. To address this, we present an inference-time method that enables users to emphasize specific parts of their prompt by steering the model's attention toward them, aligning the model's perceived importance of different prompt tokens with user intent. Unlike prior approaches that are limited to static instructions, require significant offline profiling, or rely on fixed biases, we dynamically update the proportion of model attention given to the user-specified parts--ensuring improved instruction following without performance degradation. We demonstrate that our approach improves instruction following across a variety of tasks involving multiple instructions and generalizes across models of varying scales.

Related papers

The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models [48.455388608863785]
We introduce a benchmark designed to evaluate models' abilities to follow multiple instructions through sequential instruction following tasks. Our benchmark evaluates instruction following using four tasks (text modification, question answering, mathematics, and security rules) More recent and larger models significantly outperform their older and smaller counterparts on the SIFo tasks, validating the benchmark's effectiveness.
arXiv Detail & Related papers (2024-06-28T15:34:26Z)
Contrastive Instruction Tuning [61.97704869248903]
We propose Contrastive Instruction Tuning to maximize the similarity between semantically equivalent instruction-instance pairs. Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
arXiv Detail & Related papers (2024-02-17T00:09:32Z)
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs [80.48606583629123]
PASTA is a method that allows large language models to read text with user-specified emphasis marks. It can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs.
arXiv Detail & Related papers (2023-11-03T22:56:43Z)
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions [26.192531184689763]
This paper presents Instructive Decoding (ID), a simple yet effective approach that augments the efficacy of instruction-tuned models. ID adjusts the logits for next-token prediction in a contrastive manner, utilizing predictions generated from a manipulated version of the original instruction. We conduct experiments across a spectrum of such noisy instructions, ranging from those that insert semantic noise via random words to others like 'opposite' that elicit deviated responses.
arXiv Detail & Related papers (2023-11-01T02:31:35Z)
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
We investigate how instruction tuning adjusts pre-trained models with a focus on intrinsic changes. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. Our findings reveal three significant impacts of instruction tuning.
arXiv Detail & Related papers (2023-09-30T21:16:05Z)
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following. This capability brings with it the risk of prompt injection attacks. We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z)
Evaluating the Zero-shot Robustness of Instruction-tuned Language Models [23.488398944358643]
We find that using novel (unobserved) but appropriate instruction phrasings consistently degrades model performance. We propose a simple method to mitigate this issue by introducing soft prompt'' embedding parameters. We show that this method consistently improves the robustness of instruction-tuned models.
arXiv Detail & Related papers (2023-06-20T03:48:51Z)
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting [55.15697111170836]
This paper reveals the behaviors of large language models (LLMs) towards textitinductive instructions and enhance their truthfulness and helpfulness accordingly. After extensive human and automatic evaluations, we uncovered a universal vulnerability among LLMs in processing inductive instructions. We identify that different inductive styles affect the models' ability to identify the same underlying errors, and the complexity of the underlying assumptions also influences the model's performance.
arXiv Detail & Related papers (2023-05-23T06:38:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.