Related papers: Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping

Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping

URL: http://arxiv.org/abs/2312.09983v2
Date: Mon, 18 Dec 2023 14:05:56 GMT
Title: Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping
Authors: Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez
Abstract summary: This work motivates the use of potential-based reward shaping to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally efficient IRL.
Score: 42.09724642733125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems. This work motivates the use of potential-based reward shaping to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally efficient IRL.

Related papers

Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs [58.18140409409302]
Large Language Models (LLMs) have made substantial strides in structured tasks through Reinforcement Learning (RL) Applying RL in broader domains like chatbots and content generation presents unique challenges. We show a case study of reproducing existing reward model ensemble research using embedding-based reward models.
arXiv Detail & Related papers (2025-02-04T19:37:35Z)
Umbrella Reinforcement Learning -- computationally efficient tool for hard non-linear problems [0.0]
The approach is realized on the basis of neural networks, with the use of policy gradient. It outperforms, by computational efficiency and implementation universality, all available state-of-the-art algorithms, in application to hard RL problems with sparse reward, state traps and lack of terminal states.
arXiv Detail & Related papers (2024-11-21T13:34:36Z)
Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword. We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration. We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z)
The Virtues of Pessimism in Inverse Reinforcement Learning [38.98656220917943]
Inverse Reinforcement Learning is a powerful framework for learning complex behaviors from expert demonstrations. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. We consider an alternative approach to speeding up the RL in IRL: emphpessimism, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T21:22:29Z)
Inverse Reinforcement Learning without Reinforcement Learning [40.7783129322142]
Inverse Reinforcement Learning (IRL) aims to learn a reward function that rationalizes expert demonstrations. Traditional IRL methods require repeatedly solving a hard reinforcement learning problem as a subroutine. We have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL.
arXiv Detail & Related papers (2023-03-26T04:35:53Z)
Pretraining in Deep Reinforcement Learning: A Survey [17.38360092869849]
Pretraining has shown to be effective in acquiring transferable knowledge. Due to the nature of reinforcement learning, pretraining in this field is faced with unique challenges.
arXiv Detail & Related papers (2022-11-08T02:17:54Z)
DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF) Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs. We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z)
Return-Based Contrastive Representation Learning for Reinforcement Learning [126.7440353288838]
We propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns. Our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite.
arXiv Detail & Related papers (2021-02-22T13:04:18Z)
Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches. We develop and implement a novel objective for decision making, which we term the free energy of the expected future. We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.