Related papers: Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

URL: http://arxiv.org/abs/2007.01174v4
Date: Tue, 30 Nov 2021 16:46:51 GMT
Title: Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Adrian Weller, Volkan Cevher
Abstract summary: We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. We propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch.
Score: 60.23815709215807
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. Specifically, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper bound on the learner's performance degradation based on the $\ell_1$-distance between the transition dynamics of the expert and the learner. Leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition dynamics mismatches in both finite and continuous MDP problems.

Related papers

Training Large Language Models to Reason via EM Policy Gradient [0.27195102129094995]
We introduce an off-policy reinforcement learning algorithm, EM Policy Gradient, to enhance LLM reasoning. We evaluate the effectiveness of EM Policy Gradient on the GSM8K and MATH (HARD) datasets. Models fine-tuned with our method exhibit cognitive behaviors, such as sub-problem decomposition, self-verification, and backtracking.
arXiv Detail & Related papers (2025-04-24T01:31:05Z)
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations. This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z)
Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation [8.234072589087095]
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We provide the first study on online DRMDPs with function approximation for off-dynamics RL. We introduce DR-LSVI-UCB, the first provably efficient online DRMDP algorithm for off-dynamics with function approximation.
arXiv Detail & Related papers (2024-02-23T16:01:44Z)
Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance [47.42067405054353]
Multi-objective learning (MOL) problems often arise in emerging machine learning problems. One of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants.
arXiv Detail & Related papers (2023-05-31T17:31:56Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Weighted Maximum Entropy Inverse Reinforcement Learning [22.269565708490468]
We study inverse reinforcement learning (IRL) and imitation learning (IM) We propose a new way to improve the learning process by adding the maximum weight function to the entropy framework. Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes.
arXiv Detail & Related papers (2022-08-20T06:02:07Z)
Building Robust Ensembles via Margin Boosting [98.56381714748096]
In adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks. We develop an algorithm for learning an ensemble with maximum margin. We show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-07T14:55:58Z)
Meta Continual Learning via Dynamic Programming [1.0965065178451106]
We develop a new theoretical approach for meta continual learning(MCL) We mathematically model the learning dynamics using dynamic programming, and we establish conditions of optimality for the MCL problem. We show that, on benchmark data sets, our theoretically grounded method achieves accuracy better than or comparable to that of existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-05T16:36:16Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.