Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
- URL: http://arxiv.org/abs/2506.16016v1
- Date: Thu, 19 Jun 2025 04:27:17 GMT
- Title: Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
- Authors: William Sharpless, Dylan Hirsch, Sander Tonkens, Nikhil Shinde, Sylvia Herbert,
- Abstract summary: Hard constraints in reinforcement learning (RL) often degrade policy performance.<n>We extend recent advances that connect Hamilton-Jacobi equations with RL to propose two novel value functions for dual-objective satisfaction.<n>We propose a variation of Proximal Policy Optimization (DO-HJ-PPO) which solves these problems.
- Score: 0.5937476291232802
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hard constraints in reinforcement learning (RL), whether imposed via the reward function or the model architecture, often degrade policy performance. Lagrangian methods offer a way to blend objectives with constraints, but often require intricate reward engineering and parameter tuning. In this work, we extend recent advances that connect Hamilton-Jacobi (HJ) equations with RL to propose two novel value functions for dual-objective satisfaction. Namely, we address: (1) the Reach-Always-Avoid problem - of achieving distinct reward and penalty thresholds - and (2) the Reach-Reach problem - of achieving thresholds of two distinct rewards. In contrast with temporal logic approaches, which typically involve representing an automaton, we derive explicit, tractable Bellman forms in this context by decomposing our problem into reach, avoid, and reach-avoid problems, as to leverage these aforementioned recent advances. From a mathematical perspective, the Reach-Always-Avoid and Reach-Reach problems are complementary and fundamentally different from standard sum-of-rewards problems and temporal logic problems, providing a new perspective on constrained decision-making. We leverage our analysis to propose a variation of Proximal Policy Optimization (DO-HJ-PPO), which solves these problems. Across a range of tasks for safe-arrival and multi-target achievement, we demonstrate that DO-HJ-PPO produces qualitatively distinct behaviors from previous approaches and out-competes a number of baselines in various metrics.
Related papers
- Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.<n>Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.<n>We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - A Two-Stage Learning-to-Defer Approach for Multi-Task Learning [3.4289478404209826]
We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism.<n>Our method leverages a two-stage surrogate loss family, which we prove to be both Bayes-consistent and $(mathcalG, mathcalR)$-consistent.<n> Experiments on object detection and electronic health record analysis demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T07:44:57Z) - Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning [26.244121960815907]
We propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence.
Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives.
Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.
arXiv Detail & Related papers (2024-05-26T00:42:10Z) - Near-Optimal Solutions of Constrained Learning Problems [85.48853063302764]
In machine learning systems, the need to curtail their behavior has become increasingly apparent.
This is evidenced by recent advancements towards developing models that satisfy dual robustness variables.
Our results show that rich parametrizations effectively mitigate non-dimensional, finite learning problems.
arXiv Detail & Related papers (2024-03-18T14:55:45Z) - Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning [13.245000585002858]
In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines.
We propose a constrained multi-objective gradient aggregation algorithm named Constrained Multi-Objective Gradient Aggregator (CoGAMO)
arXiv Detail & Related papers (2024-03-01T04:57:13Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Taming Lagrangian Chaos with Multi-Objective Reinforcement Learning [0.0]
We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the energy consumption of the pair.
We approach the problem by means of Multi Objective Reinforcement Learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity.
arXiv Detail & Related papers (2022-12-19T16:50:58Z) - UNIFY: a Unified Policy Designing Framework for Solving Constrained
Optimization Problems with Machine Learning [18.183339583346005]
We propose a unified framework to design a solution policy for complex decision-making problems.
Our approach relies on a clever decomposition of the policy in two stages, namely an unconstrained ML model and a CO problem.
We demonstrate the method effectiveness on two practical problems, namely an Energy Management System and the Set Multi-cover with coverage requirements.
arXiv Detail & Related papers (2022-10-25T14:09:24Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.