Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution
- URL: http://arxiv.org/abs/2602.04918v2
- Date: Fri, 06 Feb 2026 04:23:55 GMT
- Title: Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution
- Authors: Long Zhang, Fangwei Lin,
- Abstract summary: Large Language Models (LLMs) frequently prioritize conflicting in-context information over pre-existing parametric memory.<n>We show that models do not "unlearn" or suppress the magnitude of internal truths but rather employ a mechanism of geometric displacement.
- Score: 3.0242762196828448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) frequently prioritize conflicting in-context information over pre-existing parametric memory, a phenomenon often termed sycophancy or compliance. However, the mechanistic realization of this behavior remains obscure, specifically how the model resolves these knowledge conflicts through compliance, and whether this suppression arises from signal magnitude dilution or directional geometric alteration within the residual stream. To resolve this, we conducted a layer-wise geometric analysis across Qwen-3-4B, Llama-3.1-8B, and GLM-4-9B, decomposing the residual stream updates induced by counter-factual contexts into radial (norm-based) and angular (cosine-based) components. Our empirical results reject the universality of the "Manifold Dilution" hypothesis, as two of the three architectures maintained stable residual norms despite exhibiting significant performance degradation on factual queries. Instead, we observed that compliance is consistently characterized by "Orthogonal Interference," where the conflicting context injects a steering vector that is quasi-orthogonal to the ground-truth direction, effectively rotating the hidden state representation. This suggests that models do not "unlearn" or suppress the magnitude of internal truths but rather employ a mechanism of geometric displacement to bypass the correct unembedding vector, effectively simulating adoption while preserving the original structural magnitude. These findings challenge scalar confidence metrics for detecting hallucinations and underscore the necessity of vectorial monitoring to distinguish between genuine knowledge integration and superficial in-context mimicry.
Related papers
- On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation [51.56484100374058]
We formalise such information-conditioned interaction patterns as behavioural dependency.<n>This induces a probe-relative notion of $$-behavioural equivalence and a within-policy behavioural distance.<n>Results identify structural conditions under which probe-conditioned behavioural separation is not preserved under common policy transformations.
arXiv Detail & Related papers (2026-02-24T22:55:21Z) - When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks [2.4923006485141284]
We demonstrate that encoder-side poisoning induces persistent, trigger-free semantic corruption.<n> backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods.<n>Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates.
arXiv Detail & Related papers (2026-02-21T23:48:04Z) - Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning [23.616336786063552]
Flow matching has emerged as a powerful framework for generative modeling.<n>We identify a latent structural mismatch that arises when it is coupled with velocity-based objectives.<n>We prove that re-aligning the objective to the signal space eliminates the singular weighting.
arXiv Detail & Related papers (2026-02-11T02:02:30Z) - Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol [69.11739400975445]
We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
arXiv Detail & Related papers (2026-02-10T21:08:53Z) - When Does Context Help? Error Dynamics of Contextual Information in Large Language Models [64.88201012057822]
We present a unified theoretical framework for analyzing the effect of arbitrary contextual information in large language models.<n>Our analysis characterizes contextual influence through output error dynamics.<n> Experiments across ICL, retrieval-augmented generation, and memory evolution validate our theory and motivate a principled context selection strategy.
arXiv Detail & Related papers (2026-02-09T05:58:41Z) - MirrorLA: Reflecting Feature Map for Vision Linear Attention [49.41670925034762]
Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance.<n>We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation.<n>MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
arXiv Detail & Related papers (2026-02-04T09:14:09Z) - FlexCausal: Flexible Causal Disentanglement via Structural Flow Priors and Manifold-Aware Interventions [1.7114074082429929]
Causal Disentangled Representation Learning aims to learn and disentangle low dimensional representations from observations.<n>We propose FlexCausal, a novel CDRL framework based on a block-diagonal covariance VAE.<n>Our framework ensures a precise structural correspondence between the learned latent subspaces and the ground-truth causal relations.
arXiv Detail & Related papers (2026-01-29T11:30:53Z) - ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System [0.0]
This paper introduces Argus, a framework that reconceptualizes drift detection as tracking local statistics over a fixed spatial partition of the data manifold.<n> Voronoi tessellations over canonical orthonormal frames yield drift metrics that are invariant to transformations.<n>A graph-theoretic characterization of drift propagation is developed that distinguishes coherent distributional shifts from isolated perturbations.
arXiv Detail & Related papers (2026-01-03T22:39:20Z) - Manifold Percolation: from generative model to Reinforce learning [0.26905021039717986]
Generative modeling is typically framed as learning mapping rules, but from an observer's perspective without access to these rules, the task becomes disentangling the geometric support from the probability distribution.<n>We propose that continuum percolation is uniquely suited to this support analysis, as the sampling process effectively projects high-dimensional density estimation onto a geometric counting problem on the support.
arXiv Detail & Related papers (2025-11-25T17:12:42Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model [29.40036398095681]
We define the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations.<n>We build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples.<n>Our experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations.
arXiv Detail & Related papers (2025-09-26T16:02:27Z) - Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts [80.32933059529135]
Test-Time Adaptation (TTA) methods have emerged to adapt to target distributions during inference.<n>We propose Dual Uncertainty Optimization (DUO), the first TTA framework designed to jointly minimize both uncertainties for robust M3OD.<n>In parallel, we design a semantic-aware normal field constraint that preserves geometric coherence in regions with clear semantic cues.
arXiv Detail & Related papers (2025-08-28T07:09:21Z) - Curved Inference: Concern-Sensitive Geometry in Large Language Model Residual Streams [0.0]
We propose a geometric Interpretability framework that tracks how the residual stream trajectory of a large language model bends in response to shifts in semantic concern.<n>We analyse Gemma3-1b and LLaMA3.2-3b using five native-space metrics, with a primary focus on curvature (kappa_i) and salience (S(t))<n>We find that concern-shifted prompts reliably alter internal activation trajectories in both models.
arXiv Detail & Related papers (2025-07-08T23:05:00Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - An Indirect Rate-Distortion Characterization for Semantic Sources:
General Model and the Case of Gaussian Observation [83.93224401261068]
Source model is motivated by the recent surge of interest in the semantic aspect of information.
intrinsic state corresponds to the semantic feature of the source, which in general is not observable.
Rate-distortion function is the semantic rate-distortion function of the source.
arXiv Detail & Related papers (2022-01-29T02:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.