Related papers: Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

URL: http://arxiv.org/abs/2511.00617v1
Date: Sat, 01 Nov 2025 16:46:03 GMT
Title: Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
Authors: Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, Ekdeep Singh Lubana,
Abstract summary: Large language models (LLMs) can be controlled at inference time through prompts (in-context learning) and internal activations (activation steering)<n>This work offers a unified account of prompt-based and activation-based control of LLM behavior, and a methodology for empirically predicting the effects of these interventions.
Score: 22.666436755894328
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can be controlled at inference time through prompts (in-context learning) and internal activations (activation steering). Different accounts have been proposed to explain these methods, yet their common goal of controlling model behavior raises the question of whether these seemingly disparate methodologies can be seen as specific instances of a broader framework. Motivated by this, we develop a unifying, predictive account of LLM control from a Bayesian perspective. Specifically, we posit that both context- and activation-based interventions impact model behavior by altering its belief in latent concepts: steering operates by changing concept priors, while in-context learning leads to an accumulation of evidence. This results in a closed-form Bayesian model that is highly predictive of LLM behavior across context- and activation-based interventions in a set of domains inspired by prior work on many-shot in-context learning. This model helps us explain prior empirical phenomena - e.g., sigmoidal learning curves as in-context evidence accumulates - while predicting novel ones - e.g., additivity of both interventions in log-belief space, which results in distinct phases such that sudden and dramatic behavioral shifts can be induced by slightly changing intervention controls. Taken together, this work offers a unified account of prompt-based and activation-based control of LLM behavior, and a methodology for empirically predicting the effects of these interventions.

Related papers

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models [77.98801218316505]
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning.<n>We investigate the internal processing of LLMs during in-context concept inference.
arXiv Detail & Related papers (2026-02-08T03:14:39Z)
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics [81.80010043113445]
Local weight fine-tuning, LoRA-based adaptation, and activation-based interventions are studied in isolation.<n>We present a unified view that frames these interventions as dynamic weight updates induced by a control signal.<n>Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility.
arXiv Detail & Related papers (2026-02-02T17:04:36Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems [7.140644659869317]
We investigate the dynamics of peer influence in multi-agent systems based on Large Language Models (LLMs)<n>We show that the gap between self-confidence and perceived confidence in peers significantly impacts an agent's likelihood to conform.<n>We find that the format in which peer information is presented plays a critical role in modulating the strength of herd behavior.
arXiv Detail & Related papers (2025-05-27T12:12:56Z)
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model [54.64088247291416]
A fundamental objective of manipulation policy design is to endow robots to comprehend human instructions, reason about scene cues, and execute generalized actions in dynamic environments.<n>Recent autoregressive vision-language-action (VLA) methods inherit common-sense reasoning capabilities from vision-language models (VLMs) for next action-token prediction.<n>We introduce HybridVLA, a unified framework that absorbs the continuous nature of diffusion-based actions and the contextual reasoning of autoregression.
arXiv Detail & Related papers (2025-03-13T17:59:52Z)
Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients [8.39361023005333]
Deep learning models achieve high predictive performance but lack intrinsic interpretability.<n>Existing local explainability methods focus on associations, neglecting the causal drivers of model predictions.<n>We introduce a novel framework for local interventional explanations by leveraging recent advances in image-to-image editing models.
arXiv Detail & Related papers (2025-03-07T13:50:37Z)
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning [7.412307614007383]
Multimodal learning models are designed to bridge different modalities, such as images and text, by learning a shared representation space.<n>These models often exhibit a modality gap, where different modalities occupy distinct regions within the shared representation space.<n>We identify the critical roles of mismatched data pairs and a learnable temperature parameter in causing and perpetuating the modality gap during training.
arXiv Detail & Related papers (2024-12-10T20:36:49Z)
Predictive Minds: LLMs As Atypical Active Inference Agents [0.276240219662896]
Large language models (LLMs) like GPT are often conceptualized as passive predictors, simulators, or even parrots. We conceptualize LLMs by drawing on the theory of active inference originating in cognitive science and neuroscience.
arXiv Detail & Related papers (2023-11-16T22:11:12Z)
Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph. We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z)
Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors. We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives. We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.