PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning
Algorithm
- URL: http://arxiv.org/abs/2208.07914v3
- Date: Mon, 29 May 2023 19:29:13 GMT
- Title: PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning
Algorithm
- Authors: Toygun Basaklar, Suat Gumussoy, Umit Y. Ogras
- Abstract summary: We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks.
PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
- Score: 0.18416014644193063
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multi-objective reinforcement learning (MORL) approaches have emerged to
tackle many real-world problems with multiple conflicting objectives by
maximizing a joint objective function weighted by a preference vector. These
approaches find fixed customized policies corresponding to preference vectors
specified during training. However, the design constraints and objectives
typically change dynamically in real-life scenarios. Furthermore, storing a
policy for each potential preference is not scalable. Hence, obtaining a set of
Pareto front solutions for the entire preference space in a given domain with a
single training is critical. To this end, we propose a novel MORL algorithm
that trains a single universal network to cover the entire preference space
scalable to continuous robotic tasks. The proposed approach, Preference-Driven
MORL (PD-MORL), utilizes the preferences as guidance to update the network
parameters. It also employs a novel parallelization approach to increase sample
efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for
challenging continuous control tasks and uses an order of magnitude fewer
trainable parameters compared to prior approaches.
Related papers
- C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - Generalized Multi-Objective Reinforcement Learning with Envelope Updates in URLLC-enabled Vehicular Networks [12.323383132739195]
We develop a novel multi-objective reinforcement learning framework to jointly optimize wireless network selection and autonomous driving policies.
The proposed framework is designed to maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics.
The proposed policies enable autonomous vehicles to adopt safe driving behaviors with improved connectivity.
arXiv Detail & Related papers (2024-05-18T16:31:32Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL [22.468486569700236]
The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives.
We propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent.
PEDA is a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy.
arXiv Detail & Related papers (2023-04-30T20:15:26Z) - Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning [2.1408617023874443]
We propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient.
The proposed method works in both continuous and discrete action spaces with no design change of the policy network.
arXiv Detail & Related papers (2023-03-15T20:07:48Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement
Learning Approach [2.0305676256390934]
Generalized Thresholded Lexicographic Ordering (gTLO) is a novel method that aims to combine non-linear MORL with the advantages of generalized MORL.
We present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.
arXiv Detail & Related papers (2022-04-11T10:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.