Human-in-the-Loop Policy Optimization for Preference-Based
Multi-Objective Reinforcement Learning
- URL: http://arxiv.org/abs/2401.02160v1
- Date: Thu, 4 Jan 2024 09:17:53 GMT
- Title: Human-in-the-Loop Policy Optimization for Preference-Based
Multi-Objective Reinforcement Learning
- Authors: Ke Li, Han Guo
- Abstract summary: We propose a human-in-the-loop policy optimization framework for preference-based MORL.
Our method proactively learns the DM's implicit preference information without requiring any priori knowledge.
We evaluate our approach against three conventional MORL algorithms and four state-of-the-art preference-based MORL algorithms.
- Score: 13.627087954965695
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-objective reinforcement learning (MORL) aims to find a set of
high-performing and diverse policies that address trade-offs between multiple
conflicting objectives. However, in practice, decision makers (DMs) often
deploy only one or a limited number of trade-off policies. Providing too many
diversified trade-off policies to the DM not only significantly increases their
workload but also introduces noise in multi-criterion decision-making. With
this in mind, we propose a human-in-the-loop policy optimization framework for
preference-based MORL that interactively identifies policies of interest. Our
method proactively learns the DM's implicit preference information without
requiring any a priori knowledge, which is often unavailable in real-world
black-box decision scenarios. The learned preference information is used to
progressively guide policy optimization towards policies of interest. We
evaluate our approach against three conventional MORL algorithms that do not
consider preference information and four state-of-the-art preference-based MORL
algorithms on two MORL environments for robot control and smart grid
management. Experimental results fully demonstrate the effectiveness of our
proposed method in comparison to the other peer algorithms.
Related papers
- Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning [10.848218400641466]
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives.
We propose an approach for clustering the solution set generated by MORL.
arXiv Detail & Related papers (2024-11-07T15:26:38Z) - C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - Personalized Reinforcement Learning with a Budget of Policies [9.846353643883443]
Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users.
We propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints.
In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies.
We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs.
arXiv Detail & Related papers (2024-01-12T11:27:55Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL [22.468486569700236]
The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives.
We propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent.
PEDA is a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy.
arXiv Detail & Related papers (2023-04-30T20:15:26Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Strategic Decision-Making in the Presence of Information Asymmetry:
Provably Efficient RL with Algorithmic Instruments [55.41685740015095]
We study offline reinforcement learning under a novel model called strategic MDP.
We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN)
arXiv Detail & Related papers (2022-08-23T15:32:44Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.