Learning Pareto Set for Multi-Objective Continuous Robot Control
- URL: http://arxiv.org/abs/2406.18924v1
- Date: Thu, 27 Jun 2024 06:31:51 GMT
- Title: Learning Pareto Set for Multi-Objective Continuous Robot Control
- Authors: Tianye Shu, Ke Shang, Cheng Gong, Yang Nan, Hisao Ishibuchi,
- Abstract summary: We propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space.
Experimental results show that our method achieves the best overall performance with the least training parameters.
- Score: 7.853788769559891
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For a control problem with multiple conflicting objectives, there exists a set of Pareto-optimal policies called the Pareto set instead of a single optimal policy. When a multi-objective control problem is continuous and complex, traditional multi-objective reinforcement learning (MORL) algorithms search for many Pareto-optimal deep policies to approximate the Pareto set, which is quite resource-consuming. In this paper, we propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space using a single hypernet. The learned hypernet can directly generate various well-trained policy networks for different user preferences. We compare our method with two state-of-the-art MORL algorithms on seven multi-objective continuous robot control problems. Experimental results show that our method achieves the best overall performance with the least training parameters. An interesting observation is that the Pareto set is well approximated by a curved line or surface in a high-dimensional parameter space. This observation will provide insight for researchers to design new MORL algorithms.
Related papers
- C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning
Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks.
PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Pareto Navigation Gradient Descent: a First-Order Algorithm for
Optimization in Pareto Set [17.617944390196286]
Modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions.
We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information.
arXiv Detail & Related papers (2021-10-17T04:07:04Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.