Related papers: Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

URL: http://arxiv.org/abs/2512.12046v1
Date: Fri, 12 Dec 2025 21:37:11 GMT
Title: Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
Authors: Vittorio Giammarino, Ahmed H. Qureshi,
Abstract summary: Eikonal-Constrained Quasimetric RL (Eik-QRL) is a continuous-time reformulation of Quasimetric RL based on the Eikonal Partial Differential Equation (PDE)<n>Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
Score: 16.84451472788859
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.

Related papers

Variational Quantum Circuit-Based Reinforcement Learning for Dynamic Portfolio Optimization [7.349651640835185]
This paper presents a Quantum Reinforcement Learning solution to the dynamic portfolio optimization problem based on Variational Quantum Circuits.<n>We show that our quantum agents achieve risk-adjusted performance comparable to, and in some cases exceeding, that of classical Deep RL models.
arXiv Detail & Related papers (2026-01-20T15:17:24Z)
Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning [20.424372965054832]
We propose a emphPhysics-informed (Pi) regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE)<n>Unlike generic gradient penalties that are primarily used to stabilize training, our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures.<n>The proposed regularizer is broadly compatible with temporal-difference-based value learning and can be integrated into existing Offline GCRL algorithms.
arXiv Detail & Related papers (2025-09-08T15:08:42Z)
Deep Unfolded Local Quantum Annealing [4.726777092009553]
Local quantum annealing (LQA), an iterative algorithm, is designed to solve optimization problems. It draws inspiration from QA, which utilizes a gradientatic evolution to determine the global minimum given objective function. We show that deep unfolded LQA outperforms the original LQA, exhibiting remarkable insights and implications for real-world applications.
arXiv Detail & Related papers (2024-08-06T08:19:51Z)
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning [89.04776523010409]
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.
arXiv Detail & Related papers (2024-05-24T20:30:14Z)
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning [3.2857981869020327]
Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing.<n>We introduce Differential Reinforcement Learning (Differential RL), a novel framework that reformulates RL from a continuous-time control perspective.<n>We develop Differential Policy Optimization (DPO), a pointwise, stage-wise algorithm that refines local movement operators.
arXiv Detail & Related papers (2024-04-24T03:11:12Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training. For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z)
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z)
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies. In the offline RL setting, standard off-policy RL methods can significantly underperform. We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z)
Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.