Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
- URL: http://arxiv.org/abs/2007.01174v4
- Date: Tue, 30 Nov 2021 16:46:51 GMT
- Title: Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
- Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Adrian Weller,
Volkan Cevher
- Abstract summary: We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner.
We propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch.
- Score: 60.23815709215807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the inverse reinforcement learning (IRL) problem under a transition
dynamics mismatch between the expert and the learner. Specifically, we consider
the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper
bound on the learner's performance degradation based on the $\ell_1$-distance
between the transition dynamics of the expert and the learner. Leveraging
insights from the Robust RL literature, we propose a robust MCE IRL algorithm,
which is a principled approach to help with this mismatch. Finally, we
empirically demonstrate the stable performance of our algorithm compared to the
standard MCE IRL algorithm under transition dynamics mismatches in both finite
and continuous MDP problems.
Related papers
- Distributionally Robust Off-Dynamics Reinforcement Learning: Provable
Efficiency with Linear Function Approximation [8.234072589087095]
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain.
We provide the first study on online DRMDPs with function approximation for off-dynamics RL.
We introduce DR-LSVI-UCB, the first provably efficient online DRMDP algorithm for off-dynamics with function approximation.
arXiv Detail & Related papers (2024-02-23T16:01:44Z) - Three-Way Trade-Off in Multi-Objective Learning: Optimization,
Generalization and Conflict-Avoidance [47.42067405054353]
Multi-objective learning (MOL) problems often arise in emerging machine learning problems.
One of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process.
Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants.
arXiv Detail & Related papers (2023-05-31T17:31:56Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Weighted Maximum Entropy Inverse Reinforcement Learning [22.269565708490468]
We study inverse reinforcement learning (IRL) and imitation learning (IM)
We propose a new way to improve the learning process by adding the maximum weight function to the entropy framework.
Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes.
arXiv Detail & Related papers (2022-08-20T06:02:07Z) - Building Robust Ensembles via Margin Boosting [98.56381714748096]
In adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks.
We develop an algorithm for learning an ensemble with maximum margin.
We show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-07T14:55:58Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Meta Continual Learning via Dynamic Programming [1.0965065178451106]
We develop a new theoretical approach for meta continual learning(MCL)
We mathematically model the learning dynamics using dynamic programming, and we establish conditions of optimality for the MCL problem.
We show that, on benchmark data sets, our theoretically grounded method achieves accuracy better than or comparable to that of existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-05T16:36:16Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.