Related papers: On the Paradoxical Interference between Instruction-Following and Task Solving

On the Paradoxical Interference between Instruction-Following and Task Solving

URL: http://arxiv.org/abs/2601.22047v1
Date: Thu, 29 Jan 2026 17:48:56 GMT
Title: On the Paradoxical Interference between Instruction-Following and Task Solving
Authors: Yunjia Qi, Hao Peng, Xintong Shi, Amy Xin, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li,
Abstract summary: Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed.<n>We reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability.<n>We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving.
Score: 50.75960598434753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed. However, we reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability. We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving. It measures task performance drop after inserting into the instruction a self-evident constraint, which is naturally met by the original successful model output and extracted from it. Experiments on current LLMs in mathematics, multi-hop QA, and code generation show that adding the self-evident constraints leads to substantial performance drops, even for advanced models such as Claude-Sonnet-4.5. We validate the generality of the interference across constraint types and scales. Furthermore, we identify common failure patterns, and by investigating the mechanisms of interference, we observe that failed cases allocate significantly more attention to constraints compared to successful ones. Finally, we use SUSTAINSCORE to conduct an initial investigation into how distinct post-training paradigms affect the interference, presenting empirical observations on current alignment strategies. We will release our code and data to facilitate further research

Related papers

Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models [44.17697803306198]
We introduce textitCodeSeq, a synthetic post-training dataset built from number sequences.<n>Our pipeline generates supervised fine data by reflecting on failed test cases and incorporating iterative corrections.<n> Experimental results show that the models trained with textitCodeSeq improve on various reasoning tasks and can preserve the models' OOD performance.
arXiv Detail & Related papers (2025-10-16T12:29:40Z)
HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness [49.72591739116668]
Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs)<n>However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training.<n>We propose HINT: Helping Ineffective rollouts Navigate Towards effectiveness, an adaptive hinting framework.
arXiv Detail & Related papers (2025-10-10T13:42:03Z)
Steering When Necessary: Flexible Steering Large Language Models with Backtracking [16.23081952791394]
Large language models (LLMs) have achieved remarkable performance across many generation tasks.<n> Activation steering is an effective and cost-efficient approach that directly modifies the activations of LLMs during the inference stage.<n>We propose the Flexible Activation Steering with Backtracking (FASB) framework, which dynamically determines both the necessity and strength of intervention.
arXiv Detail & Related papers (2025-08-25T03:01:30Z)
Action-Constrained Imitation Learning [12.316546911223263]
Policy learning under action constraints plays a central role in ensuring safe behaviors in various robot control and resource allocation applications.<n>In this paper, we study a new problem setting termed Action-Constrained Imitation Learning (ACIL), where an action-constrained imitator aims to learn from a demonstrative expert with larger action space.<n>We tackle this mismatch through textittrajectory alignment and propose DTWIL, which replaces the original expert demonstrations with a surrogate dataset that follows similar state trajectories while adhering to the action constraints.
arXiv Detail & Related papers (2025-08-20T03:19:07Z)
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models [26.005367102695317]
Multimodal Large Language Models can exhibit difficulty in distinguishing task-relevant from irrelevant signals.<n>We show that spurious information from irrelevant modalities often leads to significant performance degradation.<n>We propose a novel framework to finetune MLLMs, including perturbation-based data augmentation with both perturbations and adversarial perturbations.
arXiv Detail & Related papers (2025-05-26T07:31:32Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.<n>We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.<n>Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification [48.27684487597968]
Focus Instruction Tuning (FIT) trains large language models to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified.<n>We demonstrate that FIT successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features.
arXiv Detail & Related papers (2024-10-30T12:01:48Z)
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment. Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics. It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z)
Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD. The framework equips the UAD with continual learning capability through contrastively-learned prompts. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.