Related papers: Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments

Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments

URL: http://arxiv.org/abs/2410.03847v1
Date: Fri, 4 Oct 2024 18:27:37 GMT
Title: Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments
Authors: Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu,
Abstract summary: We tackle the limitation of the Adrial Inverse Reinforcement Learning (AIRL) method in environments where theoretical results cannot hold and performance is degraded. We propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the environments. We present a novel Model-Enhanced AIRL framework, which integrates transition model estimation directly into reward shaping.
Score: 11.088387316161064
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic environments. Incorporating our novel model-enhanced rewards, we present a novel Model-Enhanced AIRL framework, which integrates transition model estimation directly into reward shaping. Furthermore, we provide a comprehensive theoretical analysis of the reward error bound and performance difference bound for our method. The experimental results in MuJoCo benchmarks show that our method can achieve superior performance in stochastic environments and competitive performance in deterministic environments, with significant improvement in sample efficiency, compared to existing baselines.

Related papers

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design [53.93023688824764]
We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design.<n>We propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions.<n>Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods.
arXiv Detail & Related papers (2025-07-01T05:55:28Z)
RIZE: Regularized Imitation Learning via Distributional Reinforcement Learning [0.3222802562733786]
We introduce a novel Inverse Reinforcement Learning (IRL) approach that overcomes limitations of fixed reward assignments. We extend the Maximum Entropy IRL framework with a squared temporal-difference (TD) regularizer and adaptive targets, dynamically adjusted during training. Our approach achieves state-of-the-art performance on challenging MuJoCo tasks, demonstrating expert-level results on the Humanoid task with only 3 demonstrations.
arXiv Detail & Related papers (2025-02-27T13:47:29Z)
Offline Reinforcement Learning via Inverse Optimization [3.0586855806896054]
We propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces. To mitigate the distribution shift commonly observed in ORL problems, we employ a robust and non-causal Model Predictive Control expert. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation.
arXiv Detail & Related papers (2025-02-27T12:11:44Z)
Enhancing Fairness through Reweighting: A Path to Attain the Sufficiency Rule [23.335423207588466]
We introduce an innovative approach to enhancing the empirical risk minimization process in model training. This scheme aims to uphold the sufficiency rule in fairness by ensuring that optimal predictors maintain consistency across diverse sub-groups.
arXiv Detail & Related papers (2024-08-26T09:19:58Z)
Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks. We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements. We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z)
A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL) The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance. This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.