Probability Density Estimation Based Imitation Learning
- URL: http://arxiv.org/abs/2112.06746v1
- Date: Mon, 13 Dec 2021 15:55:38 GMT
- Title: Probability Density Estimation Based Imitation Learning
- Authors: Yang Liu, Yongzhe Chang, Shilei Jiang, Xueqian Wang, Bin Liang, Bo
Yuan
- Abstract summary: Imitation Learning (IL) is an effective learning paradigm exploiting the interactions between agents and environments.
In this work, a novel reward function based on probability density estimation is proposed for IRL.
We present a "watch-try-learn" style framework named Probability Density Estimation based Imitation Learning (PDEIL)
- Score: 11.262633728487165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation Learning (IL) is an effective learning paradigm exploiting the
interactions between agents and environments. It does not require explicit
reward signals and instead tries to recover desired policies using expert
demonstrations. In general, IL methods can be categorized into Behavioral
Cloning (BC) and Inverse Reinforcement Learning (IRL). In this work, a novel
reward function based on probability density estimation is proposed for IRL,
which can significantly reduce the complexity of existing IRL methods.
Furthermore, we prove that the theoretically optimal policy derived from our
reward function is identical to the expert policy as long as it is
deterministic. Consequently, an IRL problem can be gracefully transformed into
a probability density estimation problem. Based on the proposed reward
function, we present a "watch-try-learn" style framework named Probability
Density Estimation based Imitation Learning (PDEIL), which can work in both
discrete and continuous action spaces. Finally, comprehensive experiments in
the Gym environment show that PDEIL is much more efficient than existing
algorithms in recovering rewards close to the ground truth.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - Probabilistic Inference in Reinforcement Learning Done Right [37.31057328219418]
A popular perspective in Reinforcement learning casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP)
Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference.
We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret.
arXiv Detail & Related papers (2023-11-22T10:23:14Z) - B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under
Hidden Confounding [51.74479522965712]
We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on hidden confounding.
We prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods.
arXiv Detail & Related papers (2023-04-20T18:07:19Z) - Kernel Density Bayesian Inverse Reinforcement Learning [5.699034783029326]
Inverse reinforcement learning (IRL) methods infer an agent's reward function using demonstrations of expert behavior.
This work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.
arXiv Detail & Related papers (2023-03-13T03:00:03Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Efficient Exploration of Reward Functions in Inverse Reinforcement
Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration.
This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Bayesian Robust Optimization for Imitation Learning [34.40385583372232]
Inverse reinforcement learning can enable generalization to new states by learning a parameterized reward function.
Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework.
BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors.
arXiv Detail & Related papers (2020-07-24T01:52:11Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.