Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning
- URL: http://arxiv.org/abs/2409.20258v1
- Date: Mon, 30 Sep 2024 12:49:10 GMT
- Title: Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning
- Authors: Junlin Lu, Patrick Mannion, Karl Mason,
- Abstract summary: This research proposes a dynamic weight-based preference inference algorithm.
It can infer the preferences of agents acting in multi-objective decision-making problems from demonstrations.
- Score: 2.9845592719739127
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Many decision-making problems feature multiple objectives where it is not always possible to know the preferences of a human or agent decision-maker for different objectives. However, demonstrated behaviors from the decision-maker are often available. This research proposes a dynamic weight-based preference inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems from demonstrations. The proposed algorithm is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering, and is compared to two existing preference inference algorithms. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time efficiency and inference accuracy. The DWPI algorithm maintains its performance when inferring preferences for sub-optimal demonstrations. Moreover, the DWPI algorithm does not necessitate any interactions with the user during inference - only demonstrations are required. We provide a correctness proof and complexity analysis of the algorithm and statistically evaluate the performance under different representation of demonstrations.
Related papers
- Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning [18.58278188791548]
In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training.
Despite all the proposed demonstration selection algorithms, efficiency and effectiveness remain unclear.
This lack of clarity makes it difficult to apply these algorithms in real-world scenarios.
arXiv Detail & Related papers (2024-10-30T15:11:58Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Preference Inference from Demonstration in Multi-objective Multi-agent
Decision Making [0.0]
We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations.
Empirical results demonstrate significant improvements compared to the baseline algorithms.
In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system.
arXiv Detail & Related papers (2023-04-27T12:19:28Z) - Inferring Preferences from Demonstrations in Multi-objective
Reinforcement Learning: A Dynamic Weight-based Approach [0.0]
In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives.
This research proposes a Dynamic Weight-based Preference Inference algorithm that can infer the preferences of agents acting in multi-objective decision-making problems.
arXiv Detail & Related papers (2023-04-27T11:55:07Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Uncertainty-Aware Search Framework for Multi-Objective Bayesian
Optimization [40.40632890861706]
We consider the problem of multi-objective (MO) blackbox optimization using expensive function evaluations.
We propose a novel uncertainty-aware search framework referred to as USeMO to efficiently select the sequence of inputs for evaluation.
arXiv Detail & Related papers (2022-04-12T16:50:48Z) - An Efficient Multi-Indicator and Many-Objective Optimization Algorithm
based on Two-Archive [7.7415390727490445]
This paper proposes an indicator-based multi-objective optimization algorithm based on two-archive (SRA3)
It can efficiently select good individuals in environment selection based on indicators performance and uses an adaptive parameter strategy for parental selection without setting additional parameters.
Experiments on the DTLZ and WFG problems show that SRA3 has good convergence and diversity while maintaining high efficiency.
arXiv Detail & Related papers (2022-01-14T13:09:50Z) - A survey on multi-objective hyperparameter optimization algorithms for
Machine Learning [62.997667081978825]
This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms.
We distinguish between metaheuristic-based algorithms, metamodel-based algorithms, and approaches using a mixture of both.
We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
arXiv Detail & Related papers (2021-11-23T10:22:30Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.