Exploration by Maximizing R\'enyi Entropy for Reward-Free RL Framework
- URL: http://arxiv.org/abs/2006.06193v3
- Date: Thu, 10 Dec 2020 15:16:54 GMT
- Title: Exploration by Maximizing R\'enyi Entropy for Reward-Free RL Framework
- Authors: Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li
- Abstract summary: We consider a reward-free reinforcement learning framework that separates exploration from exploitation.
In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment.
In the planning phase, the agent computes a good policy for any reward function based on the dataset.
- Score: 28.430845498323745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration is essential for reinforcement learning (RL). To face the
challenges of exploration, we consider a reward-free RL framework that
completely separates exploration from exploitation and brings new challenges
for exploration algorithms. In the exploration phase, the agent learns an
exploratory policy by interacting with a reward-free environment and collects a
dataset of transitions by executing the policy. In the planning phase, the
agent computes a good policy for any reward function based on the dataset
without further interacting with the environment. This framework is suitable
for the meta RL setting where there are many reward functions of interest. In
the exploration phase, we propose to maximize the Renyi entropy over the
state-action space and justify this objective theoretically. The success of
using Renyi entropy as the objective results from its encouragement to explore
the hard-to-reach state-actions. We further deduce a policy gradient
formulation for this objective and design a practical exploration algorithm
that can deal with complex environments. In the planning phase, we solve for
good policies given arbitrary reward functions using a batch RL algorithm.
Empirically, we show that our exploration algorithm is effective and sample
efficient, and results in superior policies for arbitrary reward functions in
the planning phase.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Efficient Exploration of Reward Functions in Inverse Reinforcement
Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration.
This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z) - On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z) - Reward-Free Exploration for Reinforcement Learning [82.3300753751066]
We propose a new "reward-free RL" framework to isolate the challenges of exploration.
We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration.
We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
arXiv Detail & Related papers (2020-02-07T14:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.