Conditional Kernel Imitation Learning for Continuous State Environments
- URL: http://arxiv.org/abs/2308.12573v1
- Date: Thu, 24 Aug 2023 05:26:42 GMT
- Title: Conditional Kernel Imitation Learning for Continuous State Environments
- Authors: Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar
- Abstract summary: We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
- Score: 9.750698192309978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation Learning (IL) is an important paradigm within the broader
reinforcement learning (RL) methodology. Unlike most of RL, it does not assume
availability of reward-feedback. Reward inference and shaping are known to be
difficult and error-prone methods particularly when the demonstration data
comes from human experts. Classical methods such as behavioral cloning and
inverse reinforcement learning are highly sensitive to estimation errors, a
problem that is particularly acute in continuous state space problems.
Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning
problems into distribution-matching problems which often require additional
online interaction data to be effective. In this paper, we consider the problem
of imitation learning in continuous state space environments based solely on
observed behavior, without access to transition dynamics information, reward
structure, or, most importantly, any additional interactions with the
environment. Our approach is based on the Markov balance equation and
introduces a novel conditional kernel density estimation-based imitation
learning framework. It involves estimating the environment's transition
dynamics using conditional kernel density estimators and seeks to satisfy the
probabilistic balance equations for the environment. We establish that our
estimators satisfy basic asymptotic consistency requirements. Through a series
of numerical experiments on continuous state benchmark environments, we show
consistently superior empirical performance over many state-of-the-art IL
algorithms.
Related papers
- Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning [8.92571113137362]
We address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning.
Unlike state-of-the-art (SOTA IL) methods, this approach tackles the limitations of conventional IL by operating in a more constrained and realistic setting.
We demonstrate consistently superior empirical performance compared to many SOTA IL algorithms.
arXiv Detail & Related papers (2024-08-17T07:17:19Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal
Point Processes [8.710154439846816]
We consider a sequential decision making problem where the agent faces the environment characterized by discrete events.
This problem exists ubiquitously in social media, finance and health informatics but is rarely investigated by the conventional research in reinforcement learning.
We present a novel framework of model-based reinforcement learning where the agent's actions and observations are asynchronous discrete events occurring in continuous-time.
arXiv Detail & Related papers (2022-01-29T11:53:40Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.