Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2509.06782v1
- Date: Mon, 08 Sep 2025 15:08:42 GMT
- Title: Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning
- Authors: Vittorio Giammarino, Ruiqi Ni, Ahmed H. Qureshi,
- Abstract summary: We propose a Physics-informed (Pi) regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE)<n>Our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures.<n>When combined with Hierarchical Implicit Q-Learning (HIQL), the resulting method yields significant improvements in both performance and generalization.
- Score: 20.424372965054832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline Goal-Conditioned Reinforcement Learning (GCRL) holds great promise for domains such as autonomous navigation and locomotion, where collecting interactive data is costly and unsafe. However, it remains challenging in practice due to the need to learn from datasets with limited coverage of the state-action space and to generalize across long-horizon tasks. To improve on these challenges, we propose a Physics-informed (Pi) regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE) and which induces a geometric inductive bias in the learned value function. Unlike generic gradient penalties that are primarily used to stabilize training, our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures. The proposed regularizer is broadly compatible with temporal-difference-based value learning and can be integrated into existing Offline GCRL algorithms. When combined with Hierarchical Implicit Q-Learning (HIQL), the resulting method, Physics-informed HIQL (Pi-HIQL), yields significant improvements in both performance and generalization, with pronounced gains in stitching regimes and large-scale navigation tasks.
Related papers
- Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression [14.037591273612788]
offline reinforcement learning (RL) enables policy learning from fixed datasets without further environment interaction.<n>Extreme $Q$-Learning (XQL) is a recent offline RL method that models Bellman errors using the Extreme Value Theorem.<n>We propose a principled method to estimate the temperature coefficient $$ via quantile regression under mild assumptions.
arXiv Detail & Related papers (2025-11-15T01:10:05Z) - OFMU: Optimization-Driven Framework for Machine Unlearning [5.100622189286672]
Large language models increasingly require the ability to unlearn specific knowledge, such as user requests, copyrighted materials, or outdated information.<n>We propose OFMU, a penalty-based bi-level optimization framework that explicitly prioritizes forgetting while preserving retention.<n>We show that OFMU consistently outperforms existing unlearning methods in both efficacy and retained utility.
arXiv Detail & Related papers (2025-09-26T15:31:32Z) - Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning [27.73410730631346]
We use physics-informed neural networks to approximate HJB-based value functions at scale.<n>This improves gradient fidelity, in turn yielding more accurate values and stronger policy learning.<n>Our results demonstrate that our approach consistently outperforms existing continuous-time baselines and scales to complex multi-agent dynamics.
arXiv Detail & Related papers (2025-09-11T04:12:50Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z) - Accelerated Learning with Linear Temporal Logic using Differentiable Simulation [21.84092672461171]
Traditional safety assurance approaches, such as state avoidance and constrained Markov decision processes, often inadequately capture trajectory requirements.<n>We propose the first method, that integrates with differentiable simulators, facilitating efficient gradient-based learning directly from specifications.<n>Our approach introduces soft labeling to achieve differentiable rewards and states, effectively mitigating the sparse-reward issue intrinsic to without compromising objective correctness.
arXiv Detail & Related papers (2025-06-01T20:59:40Z) - Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning [15.902089688167871]
offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm where goal-reaching policies are trained from abundant unlabeled datasets.<n>We propose option-aware Temporally Abstracted value learning, dubbed OTA, which incorporates temporal abstraction into the temporal-difference learning process.<n>We experimentally show that the high-level policy extracted using OTA achieves strong performance on complex tasks from OGBench.
arXiv Detail & Related papers (2025-05-19T05:51:11Z) - GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs [34.90826139012299]
We propose gradient Rectified Unlearning (GRU), an improved framework that regulates the directions of updates during the unlearning procedure.<n>GRU is easy and general to implement, demonstrating practical effectiveness across a variety of well-established unlearning benchmarks.
arXiv Detail & Related papers (2025-03-12T07:08:54Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Offline Reinforcement Learning with Value-based Episodic Memory [19.12430651038357]
offline reinforcement learning (RL) shows promise of applying RL to real-world problems.
We propose Expectile V-Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning.
We present a new offline method called Value-based Episodic Memory (VEM)
arXiv Detail & Related papers (2021-10-19T08:20:11Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.