Dynamic Subgoal-based Exploration via Bayesian Optimization
- URL: http://arxiv.org/abs/1910.09143v5
- Date: Thu, 12 Oct 2023 17:27:48 GMT
- Title: Dynamic Subgoal-based Exploration via Bayesian Optimization
- Authors: Yijia Wang, Matthias Poloczek, Daniel R. Jiang
- Abstract summary: Reinforcement learning in sparse-reward navigation environments is challenging and poses a need for effective exploration.
We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies.
An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains.
- Score: 7.297146495243708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning in sparse-reward navigation environments with
expensive and limited interactions is challenging and poses a need for
effective exploration. Motivated by complex navigation tasks that require
real-world training (when cheap simulators are not available), we consider an
agent that faces an unknown distribution of environments and must decide on an
exploration strategy. It may leverage a series of training environments to
improve its policy before it is evaluated in a test environment drawn from the
same environment distribution. Most existing approaches focus on fixed
exploration strategies, while the few that view exploration as a
meta-optimization problem tend to ignore the need for cost-efficient
exploration. We propose a cost-aware Bayesian optimization approach that
efficiently searches over a class of dynamic subgoal-based exploration
strategies. The algorithm adjusts a variety of levers -- the locations of the
subgoals, the length of each episode, and the number of replications per trial
-- in order to overcome the challenges of sparse rewards, expensive
interactions, and noise. An experimental evaluation demonstrates that the new
approach outperforms existing baselines across a number of problem domains. We
also provide a theoretical foundation and prove that the method asymptotically
identifies a near-optimal subgoal design.
Related papers
- Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - OTO Planner: An Efficient Only Travelling Once Exploration Planner for Complex and Unknown Environments [6.128246045267511]
"Only Travelling Once Planner" is an efficient exploration planner that reduces repeated paths in complex environments.
It includes fast frontier updating, viewpoint evaluation and viewpoint refinement.
It reduces the exploration time and movement distance by 10%-20% and improves the speed of frontier detection by 6-9 times.
arXiv Detail & Related papers (2024-06-11T14:23:48Z) - Adaptive trajectory-constrained exploration strategy for deep
reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces.
We propose an efficient adaptive trajectory-constrained exploration strategy for DRL.
We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Sparse Reward Exploration via Novelty Search and Emitters [55.41644538483948]
We introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm.
SERENE separates the search space exploration and reward exploitation into two alternating processes.
A meta-scheduler allocates a global computational budget by alternating between the two processes.
arXiv Detail & Related papers (2021-02-05T12:34:54Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - A Provably Efficient Sample Collection Strategy for Reinforcement
Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.
We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z) - Exploration by Maximizing R\'enyi Entropy for Reward-Free RL Framework [28.430845498323745]
We consider a reward-free reinforcement learning framework that separates exploration from exploitation.
In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment.
In the planning phase, the agent computes a good policy for any reward function based on the dataset.
arXiv Detail & Related papers (2020-06-11T05:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.