Related papers: Geometry of Drifting MDPs with Path-Integral Stability Certificates

Geometry of Drifting MDPs with Path-Integral Stability Certificates

URL: http://arxiv.org/abs/2601.21991v1
Date: Thu, 29 Jan 2026 17:03:23 GMT
Title: Geometry of Drifting MDPs with Path-Integral Stability Certificates
Authors: Zuyuan Zhang, Mahdi Imani, Tian Lan,
Abstract summary: Real-world reinforcement learning is often emphnonstationary: rewards and dynamics drift, accelerate, oscillate, and trigger abrupt switches in the optimal action.<n>We take a geometric view of nonstationary discounted Markov Decision Processes (MDPs) by modeling the environment as a differentiable homotopy path and tracking the induced motion of the optimal Bellman fixed point.<n>This yields a length--curvature--kink signature of intrinsic complexity: cumulative drift, acceleration/oscillation, and action-gap-induced nonsmoothness.
Score: 14.721539799090904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world reinforcement learning is often \emph{nonstationary}: rewards and dynamics drift, accelerate, oscillate, and trigger abrupt switches in the optimal action. Existing theory often represents nonstationarity with coarse-scale models that measure \emph{how much} the environment changes, not \emph{how} it changes locally -- even though acceleration and near-ties drive tracking error and policy chattering. We take a geometric view of nonstationary discounted Markov Decision Processes (MDPs) by modeling the environment as a differentiable homotopy path and tracking the induced motion of the optimal Bellman fixed point. This yields a length--curvature--kink signature of intrinsic complexity: cumulative drift, acceleration/oscillation, and action-gap-induced nonsmoothness. We prove a solver-agnostic path-integral stability bound and derive gap-safe feasible regions that certify local stability away from switch regimes. Building on these results, we introduce \textit{Homotopy-Tracking RL (HT-RL)} and \textit{HT-MCTS}, lightweight wrappers that estimate replay-based proxies of length, curvature, and near-tie proximity online and adapt learning or planning intensity accordingly. Experiments show improved tracking and dynamic regret over matched static baselines, with the largest gains in oscillatory and switch-prone regimes.

Related papers

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift [64.37959940809633]
We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures.<n>We show Transformer-based sequence policies substantially outperform, RNN, and SSMs in robustness, maintaining high returns even when large fractions of sensors are unavailable.
arXiv Detail & Related papers (2026-03-04T22:21:54Z)
ConsistentRFT: Reducing Visual Hallucinations in Flow-based Reinforcement Fine-Tuning [85.20505958752928]
Reinforcement Fine-Tuning (RFT) on flow-based models is crucial for preference alignment.<n>RFT often introduce visual hallucinations like over-optimized details and semantic misalignment.<n>This work preliminarily explores why visual hallucinations arise and how to reduce them.
arXiv Detail & Related papers (2026-02-03T11:49:46Z)
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization [0.0]
We analyze the tracking performance of Gradient Descent under uniform strong convexity and smoothness in varying stepsize regimes.<n>We show that momentum can substantially amplify drift-induced tracking error, with an explicit penalty on the tracking capability.<n>These results provide a definitive theoretical grounding for the empirical instability of momentum in dynamic environments.
arXiv Detail & Related papers (2026-01-18T03:27:21Z)
Guided Path Sampling: Steering Diffusion Models Back on Track with Principled Path Guidance [5.814544128372275]
We propose Guided Path Sampling (GPS) as a new paradigm for iterative refinement.<n>GPS replaces unstable extrapolation with a principled, manifold-constrained, ensuring the sampling path remains on the data manifold.<n>GPS outperforms existing methods in both perceptual quality and complex prompt adherence.
arXiv Detail & Related papers (2025-12-28T11:12:56Z)
Unifying Sign and Magnitude for Optimizing Deep Vision Networks via ThermoLion [0.0]
Current paradigms impose a static compromise on information channel drift parameters.<n>We introduce a "low-dimensional" exploration model and a "low-dimensional" dynamic alignment framework.
arXiv Detail & Related papers (2025-12-01T17:04:17Z)
Reasoning in Diffusion Large Language Models is Concentrated in Dynamic Confusion Zones [3.7312377768685714]
We propose Adaptive Trajectory Policy Optimization (ATPO), a lightweight step-selection strategy that dynamically reallocates gradient updates to high-leverage steps without changing the RL objective, rewards, or compute budget.<n>ATPO delivers substantial gains in reasoning accuracy and training stability across benchmarks, showing that exploiting trajectory dynamics is key to advancing dLLM RL.
arXiv Detail & Related papers (2025-11-19T07:59:34Z)
Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching [16.17115009663765]
We introduce textbfPAFM, a framework that models perturbed trajectories to ensure stable and structurally consistent time series generation.<n>The framework incorporates perturbation-guided training to simulate localized disturbances and leverages a dual-path velocity field to capture trajectory deviations under perturbation.<n>In experiments on both unconditional and conditional generation tasks, PAFM consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-11-18T13:30:56Z)
INC: An Indirect Neural Corrector for Auto-Regressive Hybrid PDE Solvers [61.84396402100827]
We propose the Indirect Neural Corrector ($mathrmINC$), which integrates learned corrections into the governing equations.<n>$mathrmINC$ reduces the error amplification on the order of $t-1 + L$, where $t$ is the timestep and $L$ the Lipschitz constant.<n>We test $mathrmINC$ in extensive benchmarks, covering numerous differentiable solvers, neural backbones, and test cases ranging from a 1D chaotic system to 3D turbulence.
arXiv Detail & Related papers (2025-11-16T20:14:28Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3) [51.510040541600176]
We propose a novel approach to modeling the rotation of moving objects in computer vision.<n>Our approach is agnostic to energy and momentum conservation while being robust to input noise.<n>By learning to approximate object dynamics from noisy states during training, our model attains robust extrapolation capabilities in simulation and various real-world settings.
arXiv Detail & Related papers (2025-08-11T09:03:10Z)
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking [58.719310295870024]
This paper presents an event-based framework for tracking any point.<n>To resolve ambiguities caused by event sparsity, a motion-guidance module incorporates kinematic vectors into the local matching process.<n>The method improves the $Survival_50$ metric by 17.9% over event-only tracking of any point baseline.
arXiv Detail & Related papers (2024-12-02T09:13:29Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.