Inferring Preferences from Demonstrations in Multi-objective
Reinforcement Learning: A Dynamic Weight-based Approach
- URL: http://arxiv.org/abs/2304.14115v1
- Date: Thu, 27 Apr 2023 11:55:07 GMT
- Title: Inferring Preferences from Demonstrations in Multi-objective
Reinforcement Learning: A Dynamic Weight-based Approach
- Authors: Junlin Lu, Patrick Mannion, Karl Mason
- Abstract summary: In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives.
This research proposes a Dynamic Weight-based Preference Inference algorithm that can infer the preferences of agents acting in multi-objective decision-making problems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many decision-making problems feature multiple objectives. In such problems,
it is not always possible to know the preferences of a decision-maker for
different objectives. However, it is often possible to observe the behavior of
decision-makers. In multi-objective decision-making, preference inference is
the process of inferring the preferences of a decision-maker for different
objectives. This research proposes a Dynamic Weight-based Preference Inference
(DWPI) algorithm that can infer the preferences of agents acting in
multi-objective decision-making problems, based on observed behavior
trajectories in the environment. The proposed method is evaluated on three
multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item
Gathering. The performance of the proposed DWPI approach is compared to two
existing preference inference methods from the literature, and empirical
results demonstrate significant improvements compared to the baseline
algorithms, in terms of both time requirements and accuracy of the inferred
preferences. The Dynamic Weight-based Preference Inference algorithm also
maintains its performance when inferring preferences for sub-optimal behavior
demonstrations. In addition to its impressive performance, the Dynamic
Weight-based Preference Inference algorithm does not require any interactions
during training with the agent whose preferences are inferred, all that is
required is a trajectory of observed behavior.
Related papers
- Dynamic Detection of Relevant Objectives and Adaptation to Preference Drifts in Interactive Evolutionary Multi-Objective Optimization [2.4374097382908477]
We study the dynamic nature of DM preferences, which can evolve throughout the decision-making process and affect the relevance of objectives.
We propose methods to discard outdated or conflicting preferences when such shifts occur.
Our experimental results demonstrate that the proposed methods effectively manage evolving preferences and significantly enhance the quality and desirability of the solutions produced by the algorithm.
arXiv Detail & Related papers (2024-11-07T09:09:06Z) - Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning [2.9845592719739127]
This research proposes a dynamic weight-based preference inference algorithm.
It can infer the preferences of agents acting in multi-objective decision-making problems from demonstrations.
arXiv Detail & Related papers (2024-09-30T12:49:10Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms.
Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z) - Differentiation of Multi-objective Data-driven Decision Pipeline [34.577809430781144]
Real-world scenarios frequently involve multi-objective data-driven optimization problems.
Traditional two-stage methods apply a machine learning model to estimate problem coefficients, followed by invoking a solver to tackle the predicted optimization problem.
Recent efforts have focused on end-to-end training of predictive models that use decision loss derived from the downstream optimization problem.
arXiv Detail & Related papers (2024-06-02T15:42:03Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Behavior-Contextualized Item Preference Modeling for Multi-Behavior Recommendation [30.715182718492244]
This paper introduces a novel approach, Behavior-Contextualized Item Preference Modeling (BCIPM) for multi-behavior recommendation.
Our proposed Behavior-Contextualized Item Preference Network discerns and learns users' specific item preferences within each behavior.
It then considers only those preferences relevant to the target behavior for final recommendations, significantly reducing noise from auxiliary behaviors.
arXiv Detail & Related papers (2024-04-28T12:46:36Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Preference Inference from Demonstration in Multi-objective Multi-agent
Decision Making [0.0]
We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations.
Empirical results demonstrate significant improvements compared to the baseline algorithms.
In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system.
arXiv Detail & Related papers (2023-04-27T12:19:28Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.