Related papers: Lexicographic Multi-Objective Reinforcement Learning

Lexicographic Multi-Objective Reinforcement Learning

URL: http://arxiv.org/abs/2212.13769v1
Date: Wed, 28 Dec 2022 10:22:36 GMT
Title: Lexicographic Multi-Objective Reinforcement Learning
Authors: Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate
Abstract summary: We present a family of both action-value and policy gradient algorithms that can be used to solve such problems. We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
Score: 65.90380946224869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.

Related papers

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning [0.0]
Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. We present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns.
arXiv Detail & Related papers (2024-08-24T06:32:30Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z)
Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them. We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model. Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z)
Attaining Interpretability in Reinforcement Learning via Hierarchical Primitive Composition [3.1078562713129765]
We propose a novel hierarchical reinforcement learning algorithm that mitigates the aforementioned issues by decomposing the original task in a hierarchy. We show how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator.
arXiv Detail & Related papers (2021-10-05T05:59:31Z)
Inverse Reinforcement Learning with Explicit Policy Estimates [19.159290496678004]
Various methods for solving the inverse reinforcement learning problem have been developed independently in machine learning and economics. We show that they all belong to a class of optimization problems, characterized by a common form of gradient, the associated policy and the objective. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.
arXiv Detail & Related papers (2021-03-04T07:00:58Z)
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z)
Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks [14.461847761198037]
This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies.
arXiv Detail & Related papers (2020-03-20T20:32:20Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.