Lexicographic Multi-Objective Reinforcement Learning
- URL: http://arxiv.org/abs/2212.13769v1
- Date: Wed, 28 Dec 2022 10:22:36 GMT
- Title: Lexicographic Multi-Objective Reinforcement Learning
- Authors: Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate
- Abstract summary: We present a family of both action-value and policy gradient algorithms that can be used to solve such problems.
We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
- Score: 65.90380946224869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we introduce reinforcement learning techniques for solving
lexicographic multi-objective problems. These are problems that involve
multiple reward signals, and where the goal is to learn a policy that maximises
the first reward signal, and subject to this constraint also maximises the
second reward signal, and so on. We present a family of both action-value and
policy gradient algorithms that can be used to solve such problems, and prove
that they converge to policies that are lexicographically optimal. We evaluate
the scalability and performance of these algorithms empirically, demonstrating
their practical applicability. As a more specific application, we show how our
algorithms can be used to impose safety constraints on the behaviour of an
agent, and compare their performance in this context with that of other
constrained reinforcement learning algorithms.
Related papers
- Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning [0.0]
Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios.
Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce.
We present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns.
arXiv Detail & Related papers (2024-08-24T06:32:30Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Attaining Interpretability in Reinforcement Learning via Hierarchical
Primitive Composition [3.1078562713129765]
We propose a novel hierarchical reinforcement learning algorithm that mitigates the aforementioned issues by decomposing the original task in a hierarchy.
We show how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator.
arXiv Detail & Related papers (2021-10-05T05:59:31Z) - Inverse Reinforcement Learning with Explicit Policy Estimates [19.159290496678004]
Various methods for solving the inverse reinforcement learning problem have been developed independently in machine learning and economics.
We show that they all belong to a class of optimization problems, characterized by a common form of gradient, the associated policy and the objective.
Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.
arXiv Detail & Related papers (2021-03-04T07:00:58Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Safe Reinforcement Learning of Control-Affine Systems with Vertex
Networks [14.461847761198037]
This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints.
Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy.
To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies.
arXiv Detail & Related papers (2020-03-20T20:32:20Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.