Related papers: Drift No More? Context Equilibria in Multi-Turn LLM Interactions

Drift No More? Context Equilibria in Multi-Turn LLM Interactions

URL: http://arxiv.org/abs/2510.07777v1
Date: Thu, 09 Oct 2025 04:48:49 GMT
Title: Drift No More? Context Equilibria in Multi-Turn LLM Interactions
Authors: Vardhan Dongre, Ryan A. Rossi, Viet Dac Lai, David Seunghyun Yoon, Dilek Hakkani-Tür, Trung Bui,
Abstract summary: contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
Score: 58.69551510148673
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large Language Models (LLMs) excel at single-turn tasks such as instruction following and summarization, yet real-world deployments require sustained multi-turn interactions where user goals and conversational context persist and evolve. A recurring challenge in this setting is context drift: the gradual divergence of a model's outputs from goal-consistent behavior across turns. Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics. In this work, we present a study of context drift in multi-turn interactions and propose a simple dynamical framework to interpret its behavior. We formalize drift as the turn-wise KL divergence between the token-level predictive distributions of the test model and a goal-consistent reference model, and propose a recurrence model that interprets its evolution as a bounded stochastic process with restoring forces and controllable interventions. We instantiate this framework in both synthetic long-horizon rewriting tasks and realistic user-agent simulations such as in $\tau$-Bench, measuring drift for several open-weight LLMs that are used as user simulators. Our experiments consistently reveal stable, noise-limited equilibria rather than runaway degradation, and demonstrate that simple reminder interventions reliably reduce divergence in line with theoretical predictions. Together, these results suggest that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay, providing a foundation for studying and mitigating context drift in extended interactions.

Related papers

Closing the Loop: A Control-Theoretic Framework for Provably Stable Time Series Forecasting with LLMs [22.486083545585984]
Large Language Models (LLMs) have recently shown exceptional potential in time series forecasting.<n>Existing approaches typically employ a naive autoregressive generation strategy.<n>We propose textbfF-LLM, a novel closed-loop framework.
arXiv Detail & Related papers (2026-02-13T09:35:12Z)
Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z)
Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts [74.47786985522762]
We identify a critical failure mode termed textual inertia, where models tend to blindly adhere to the erroneous text while neglecting conflicting visual evidence.<n>We propose the LogicGraph Perturbation Protocol that structurally injects perturbations into the reasoning chains of diverse LMMs.<n>Results reveal that models successfully self-correct in less than 10% of cases and predominantly succumb to blind textual error propagation.
arXiv Detail & Related papers (2026-01-07T16:39:34Z)
Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows [18.240532888032394]
We present REALM (REalistic AI Learning for Multiphysics), a rigorous benchmarking framework designed to test neural surrogates on challenging, application-driven reactive flows.<n>We benchmark over a dozen representative surrogate model families, including spectral operators, convolutional models, Transformers, pointwise operators, and graph/mesh networks.<n>We identify three robust trends: (i) a scaling barrier governed jointly by dimensionality, stiffness, and mesh irregularity, leading to rapidly growing rollout errors; (ii) performance primarily controlled by architectural inductive biases rather than parameter count; and (iii) a persistent gap between nominal accuracy metrics and physically
arXiv Detail & Related papers (2025-12-21T05:04:13Z)
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z)
Lyapunov-Stable Adaptive Control for Multimodal Concept Drift [1.4864895279988264]
This paper introduces LS-OGD, a novel adaptive control framework for robust multimodal learning in the presence of concept drift.<n>Under bounded drift conditions, the LS-OGD system's prediction error is uniformly ultimately bounded and converges to zero if the drift ceases.
arXiv Detail & Related papers (2025-10-09T18:55:26Z)
When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior [22.102157707436884]
Traditional deterministic models often fail to capture the full extent of variability and unpredictability in human driving.<n>This study introduces an interpretable modeling framework that captures not only context-dependent dynamics but also residual variability beyond what context can explain.<n>The integration of interpretability and accuracy makes this framework a promising tool for traffic analysis and safety-critical applications.
arXiv Detail & Related papers (2025-07-09T16:42:41Z)
RIFT: Group-Relative RL Fine-Tuning for Realistic and Controllable Traffic Simulation [13.319344167881383]
We introduce a dual-stage AV-centric simulation framework that conducts imitation learning pre-training in a data-driven simulator.<n>We then learn fine-tuning in a physics-based simulator to enhance style-level controllability.<n>In the fine-tuning stage, we propose RIFT, a novel group-relative RL fine-tuning strategy.
arXiv Detail & Related papers (2025-05-06T09:12:37Z)
Sequential Representation Learning via Static-Dynamic Conditional Disentanglement [58.19137637859017]
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables. Experiments show that the proposed approach outperforms previous complex state-of-the-art techniques in scenarios where the dynamics of a scene are influenced by its content.
arXiv Detail & Related papers (2024-08-10T17:04:39Z)
Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting [11.106812447960186]
We introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT) CDT integrates information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left.
arXiv Detail & Related papers (2024-02-06T13:16:54Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
On Learning the Tail Quantiles of Driving Behavior Distributions via Quantile Regression and Flows [13.540998552232006]
We consider the problem of learning models that accurately capture the diversity and tail quantiles of human driver behavior probability distributions. We adapt two flexible quantile learning frameworks for this setting that avoid strong distributional assumptions. We evaluate our approach in a one-step acceleration prediction task, and in multi-step driver simulation rollouts.
arXiv Detail & Related papers (2023-05-22T15:09:04Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.