Related papers: Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow

Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow

URL: http://arxiv.org/abs/2602.14587v1
Date: Mon, 16 Feb 2026 09:35:25 GMT
Title: Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow
Authors: Minh Nguyen,
Abstract summary: Real-world control problems evolve in continuous time with non-uniform, event-driven decisions.<n>As time gaps shrink, the $Q$-function collapses to the value function $V$, eliminating action ranking.<n>Existing continuous-time methods reintroduce action information via an advantage-rate function $q$.<n>We propose a novel decoupled continuous-time actor-critic algorithm with alternating updates.
Score: 1.8824572526199168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many real-world control problems, ranging from finance to robotics, evolve in continuous time with non-uniform, event-driven decisions. Standard discrete-time reinforcement learning (RL), based on fixed-step Bellman updates, struggles in this setting: as time gaps shrink, the $Q$-function collapses to the value function $V$, eliminating action ranking. Existing continuous-time methods reintroduce action information via an advantage-rate function $q$. However, they enforce optimality through complicated martingale losses or orthogonality constraints, which are sensitive to the choice of test processes. These approaches entangle $V$ and $q$ into a large, complex optimization problem that is difficult to train reliably. To address these limitations, we propose a novel decoupled continuous-time actor-critic algorithm with alternating updates: $q$ is learned from diffusion generators on $V$, and $V$ is updated via a Hamiltonian-based value flow that remains informative under infinitesimal time steps, where standard max/softmax backups fail. Theoretically, we prove rigorous convergence via new probabilistic arguments, sidestepping the challenge that generator-based Hamiltonians lack Bellman-style contraction under the sup-norm. Empirically, our method outperforms prior continuous-time and leading discrete-time baselines across challenging continuous-control benchmarks and a real-world trading task, achieving 21% profit over a single quarter$-$nearly doubling the second-best method.

Related papers

Online Markov Decision Processes with Terminal Law Constraints [10.878763806286157]
We introduce a reset-free framework called the periodic framework.<n>The goal is to find periodic policies that minimize cumulative loss and return the agents to their initial state distribution after a fixed number of steps.<n>We show first algorithms for computing periodic policies in two multi-agent settings and show they achieve sublinear periodic regret of order $tilde O(T3/4)$.<n>This provides the first non-asymptotic guarantees for reset-free learning in the setting of $M$ homogeneous agents, for $M > 1$.
arXiv Detail & Related papers (2026-01-12T12:46:12Z)
Learning and certification of local time-dependent quantum dynamics and noise [5.1798081822960365]
Hamiltonian learning protocols are essential tools to benchmark quantum computers and simulators.<n>We learn the time-dependent evolution of a locally interacting $nqubit system on a graph of effective dimensionD$.<n>Our protocol outputs function approximating coefficients to accuracy $epsilon$ on an interval with success probability $1-delta$.
arXiv Detail & Related papers (2025-10-09T17:39:40Z)
From Continual Learning to SGD and Back: Better Rates for Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations.<n>We develop novel last-iterate upper bounds in the realizable least squares setup.<n>We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic in sufficiently long task sequences.
arXiv Detail & Related papers (2025-04-06T18:39:45Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Near-continuous time Reinforcement Learning for continuous state-action spaces [3.5527561584422456]
We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively.
arXiv Detail & Related papers (2023-09-06T08:01:17Z)
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes. We propose an algorithm based on Thompson sampling with dynamically-sized episodes. We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z)
Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting [71.82716109461967]
We propose an algorithm called Mild-OGD for the full-information case, where delayed gradients are available.<n>We show that the dynamic regret of Mild-OGD can be automatically bounded by $O(sqrtbardT(P_T+1))$ under the in-order assumption.<n>We also develop a bandit variant of Mild-OGD for a more challenging case with only delayed loss values.
arXiv Detail & Related papers (2023-05-20T07:54:07Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
Acting in Delayed Environments with Non-Stationary Markov Policies [57.52103323209643]
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps. We prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary. We devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation.
arXiv Detail & Related papers (2021-01-28T13:35:37Z)
Accelerated Learning with Robustness to Adversarial Regressors [1.0499611180329802]
We propose a new discrete time algorithm which provides stability and convergence guarantees in the presence of adversarial regressors. In particular, our algorithm reaches an $epsilon$ sub-optimal point in at most $tildemathcalO (1/sqrtepsilon)$ when regressors are constant.
arXiv Detail & Related papers (2020-05-04T14:42:49Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.