Sample Efficient Deep Reinforcement Learning via Local Planning
- URL: http://arxiv.org/abs/2301.12579v2
- Date: Mon, 3 Jul 2023 04:36:44 GMT
- Title: Sample Efficient Deep Reinforcement Learning via Local Planning
- Authors: Dong Yin, Sridhar Thiagarajan, Nevena Lazic, Nived Rajaraman, Botao
Hao, Csaba Szepesvari
- Abstract summary: This work focuses on sample-efficient deep reinforcement learning (RL) with a simulator.
We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property.
We demonstrate that this simple procedure can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks.
- Score: 21.420851589712626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The focus of this work is sample-efficient deep reinforcement learning (RL)
with a simulator. One useful property of simulators is that it is typically
easy to reset the environment to a previously observed state. We propose an
algorithmic framework, named uncertainty-first local planning (UFLP), that
takes advantage of this property. Concretely, in each data collection
iteration, with some probability, our meta-algorithm resets the environment to
an observed state which has high uncertainty, instead of sampling according to
the initial-state distribution. The agent-environment interaction then proceeds
as in the standard online RL setting. We demonstrate that this simple procedure
can dramatically improve the sample cost of several baseline RL algorithms on
difficult exploration tasks. Notably, with our framework, we can achieve
super-human performance on the notoriously hard Atari game, Montezuma's
Revenge, with a simple (distributional) double DQN. Our work can be seen as an
efficient approximate implementation of an existing algorithm with theoretical
guarantees, which offers an interpretation of the positive empirical results.
Related papers
- REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm [14.517103323409307]
Sim-to-real gap represents disparity between training and testing environments.
A promising approach to addressing this challenge is distributionally robust RL.
We tackle robust RL via interactive data collection and present an algorithm with a provable sample complexity guarantee.
arXiv Detail & Related papers (2024-04-04T16:40:22Z) - Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement
Learning Using Unique Experiences [8.983448736644382]
Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms.
We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer.
arXiv Detail & Related papers (2024-02-05T10:04:00Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration.
We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.