Inverse Reinforcement Learning with Sub-optimal Experts
- URL: http://arxiv.org/abs/2401.03857v1
- Date: Mon, 8 Jan 2024 12:39:25 GMT
- Title: Inverse Reinforcement Learning with Sub-optimal Experts
- Authors: Riccardo Poiani, Gabriele Curti, Alberto Maria Metelli, Marcello
Restelli
- Abstract summary: We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
- Score: 56.553106680769474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse Reinforcement Learning (IRL) techniques deal with the problem of
deducing a reward function that explains the behavior of an expert agent who is
assumed to act optimally in an underlying unknown task. In several problems of
interest, however, it is possible to observe the behavior of multiple experts
with different degree of optimality (e.g., racing drivers whose skills ranges
from amateurs to professionals). For this reason, in this work, we extend the
IRL formulation to problems where, in addition to demonstrations from the
optimal agent, we can observe the behavior of multiple sub-optimal experts.
Given this problem, we first study the theoretical properties of the class of
reward functions that are compatible with a given set of experts, i.e., the
feasible reward set. Our results show that the presence of multiple sub-optimal
experts can significantly shrink the set of compatible rewards. Furthermore, we
study the statistical complexity of estimating the feasible reward set with a
generative model. To this end, we analyze a uniform sampling algorithm that
results in being minimax optimal whenever the sub-optimal experts' performance
level is sufficiently close to the one of the optimal agent.
Related papers
- On Multi-Agent Inverse Reinforcement Learning [8.284137254112848]
We extend the Inverse Reinforcement Learning (IRL) framework to the multi-agent setting, assuming to observe agents who are following Nash Equilibrium (NE) policies.
We provide an explicit characterization of the feasible reward set and analyze how errors in estimating the transition dynamics and expert behavior impact the recovered rewards.
arXiv Detail & Related papers (2024-11-22T16:31:36Z) - Satisficing Exploration for Deep Reinforcement Learning [26.73584163318647]
In complex environments that approach the vastness and scale of the real world, attaining optimal performance may in fact be an entirely intractable endeavor.
Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions.
We extend an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies.
arXiv Detail & Related papers (2024-07-16T21:28:03Z) - Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts [78.3687645289918]
We show that the sigmoid gating function enjoys a higher sample efficiency than the softmax gating for the statistical task of expert estimation.
We find that experts formulated as feed-forward networks with commonly used activation such as ReLU and GELU enjoy faster convergence rates under the sigmoid gating.
arXiv Detail & Related papers (2024-05-22T21:12:34Z) - Human-Algorithm Collaborative Bayesian Optimization for Engineering Systems [0.0]
We re-introduce the human back into the data-driven decision making loop by outlining an approach for collaborative Bayesian optimization.
Our methodology exploits the hypothesis that humans are more efficient at making discrete choices rather than continuous ones.
We demonstrate our approach across a number of applied and numerical case studies including bioprocess optimization and reactor geometry design.
arXiv Detail & Related papers (2024-04-16T23:17:04Z) - Divide and not forget: Ensemble of selectively trained experts in Continual Learning [0.2886273197127056]
Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know.
A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task.
SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert.
arXiv Detail & Related papers (2024-01-18T18:25:29Z) - Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks.
We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Inverse Reinforcement Learning via Matching of Optimality Profiles [2.561053769852449]
We propose an algorithm that learns a reward function from demonstrations of suboptimal or heterogeneous performance.
We show that our method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.
arXiv Detail & Related papers (2020-11-18T13:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.