gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement
Learning Approach
- URL: http://arxiv.org/abs/2204.04988v1
- Date: Mon, 11 Apr 2022 10:06:49 GMT
- Title: gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement
Learning Approach
- Authors: Johannes Dornheim
- Abstract summary: Generalized Thresholded Lexicographic Ordering (gTLO) is a novel method that aims to combine non-linear MORL with the advantages of generalized MORL.
We present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world decision optimization, often multiple competing objectives must
be taken into account. Following classical reinforcement learning, these
objectives have to be combined into a single reward function. In contrast,
multi-objective reinforcement learning (MORL) methods learn from vectors of
per-objective rewards instead. In the case of multi-policy MORL, sets of
decision policies for various preferences regarding the conflicting objectives
are optimized. This is especially important when target preferences are not
known during training or when preferences change dynamically during
application. While it is, in general, straightforward to extend a
single-objective reinforcement learning method for MORL based on linear
scalarization, solutions that are reachable by these methods are limited to
convex regions of the Pareto front. Non-linear MORL methods like Thresholded
Lexicographic Ordering (TLO) are designed to overcome this limitation.
Generalized MORL methods utilize function approximation to generalize across
objective preferences and thereby implicitly learn multiple policies in a
data-efficient manner, even for complex decision problems with high-dimensional
or continuous state spaces. In this work, we propose \textit{generalized
Thresholded Lexicographic Ordering} (gTLO), a novel method that aims to combine
non-linear MORL with the advantages of generalized MORL. We introduce a deep
reinforcement learning realization of the algorithm and present promising
results on a standard benchmark for non-linear MORL and a real-world
application from the domain of manufacturing process control.
Related papers
- C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning [2.1408617023874443]
We propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient.
The proposed method works in both continuous and discrete action spaces with no design change of the policy network.
arXiv Detail & Related papers (2023-03-15T20:07:48Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning
Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks.
PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.