Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation
- URL: http://arxiv.org/abs/2408.12110v1
- Date: Thu, 22 Aug 2024 03:51:39 GMT
- Title: Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation
- Authors: Woo Kyung Kim, Minjong Yoo, Honguk Woo,
- Abstract summary: In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator.
We show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks.
- Score: 6.876580618014666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.
Related papers
- C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - Learning Pareto Set for Multi-Objective Continuous Robot Control [7.853788769559891]
We propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space.
Experimental results show that our method achieves the best overall performance with the least training parameters.
arXiv Detail & Related papers (2024-06-27T06:31:51Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL [22.468486569700236]
The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives.
We propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent.
PEDA is a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy.
arXiv Detail & Related papers (2023-04-30T20:15:26Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning
Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks.
PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z) - Multi-Objective Learning to Predict Pareto Fronts Using Hypervolume
Maximization [0.0]
Real-world problems are often multi-objective with decision-makers unable to specify a priori which trade-off between the conflicting objectives is preferable.
We propose a novel learning approach to estimate the Pareto front by maximizing the dominated hypervolume (HV) of the average loss vectors corresponding to a set of learners.
In our approach, the set of learners are trained multi-objectively with a dynamic loss function, wherein each learner's losses are weighted by their HV maximizing gradients.
Experiments on three different multi-objective tasks show that the outputs of the set of learners are indeed well-spread on
arXiv Detail & Related papers (2021-02-08T20:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.