Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
- URL: http://arxiv.org/abs/2512.12046v1
- Date: Fri, 12 Dec 2025 21:37:11 GMT
- Title: Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
- Authors: Vittorio Giammarino, Ahmed H. Qureshi,
- Abstract summary: Eikonal-Constrained Quasimetric RL (Eik-QRL) is a continuous-time reformulation of Quasimetric RL based on the Eikonal Partial Differential Equation (PDE)<n>Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
- Score: 16.84451472788859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
Related papers
- Variational Quantum Circuit-Based Reinforcement Learning for Dynamic Portfolio Optimization [7.349651640835185]
This paper presents a Quantum Reinforcement Learning solution to the dynamic portfolio optimization problem based on Variational Quantum Circuits.<n>We show that our quantum agents achieve risk-adjusted performance comparable to, and in some cases exceeding, that of classical Deep RL models.
arXiv Detail & Related papers (2026-01-20T15:17:24Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning [20.424372965054832]
We propose a emphPhysics-informed (Pi) regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE)<n>Unlike generic gradient penalties that are primarily used to stabilize training, our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures.<n>The proposed regularizer is broadly compatible with temporal-difference-based value learning and can be integrated into existing Offline GCRL algorithms.
arXiv Detail & Related papers (2025-09-08T15:08:42Z) - Deep Unfolded Local Quantum Annealing [4.726777092009553]
Local quantum annealing (LQA), an iterative algorithm, is designed to solve optimization problems.
It draws inspiration from QA, which utilizes a gradientatic evolution to determine the global minimum given objective function.
We show that deep unfolded LQA outperforms the original LQA, exhibiting remarkable insights and implications for real-world applications.
arXiv Detail & Related papers (2024-08-06T08:19:51Z) - SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning [89.04776523010409]
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.
In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping.
We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.
arXiv Detail & Related papers (2024-05-24T20:30:14Z) - DPO: A Differential and Pointwise Control Approach to Reinforcement Learning [3.2857981869020327]
Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing.<n>We introduce Differential Reinforcement Learning (Differential RL), a novel framework that reformulates RL from a continuous-time control perspective.<n>We develop Differential Policy Optimization (DPO), a pointwise, stage-wise algorithm that refines local movement operators.
arXiv Detail & Related papers (2024-04-24T03:11:12Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions.
On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z) - EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies.
In the offline RL setting, standard off-policy RL methods can significantly underperform.
We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.