Related papers: Dynamic Subgoal-based Exploration via Bayesian Optimization

Dynamic Subgoal-based Exploration via Bayesian Optimization

URL: http://arxiv.org/abs/1910.09143v5
Date: Thu, 12 Oct 2023 17:27:48 GMT
Title: Dynamic Subgoal-based Exploration via Bayesian Optimization
Authors: Yijia Wang, Matthias Poloczek, Daniel R. Jiang
Abstract summary: Reinforcement learning in sparse-reward navigation environments is challenging and poses a need for effective exploration. We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains.
Score: 7.297146495243708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning in sparse-reward navigation environments with expensive and limited interactions is challenging and poses a need for effective exploration. Motivated by complex navigation tasks that require real-world training (when cheap simulators are not available), we consider an agent that faces an unknown distribution of environments and must decide on an exploration strategy. It may leverage a series of training environments to improve its policy before it is evaluated in a test environment drawn from the same environment distribution. Most existing approaches focus on fixed exploration strategies, while the few that view exploration as a meta-optimization problem tend to ignore the need for cost-efficient exploration. We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. The algorithm adjusts a variety of levers -- the locations of the subgoals, the length of each episode, and the number of replications per trial -- in order to overcome the challenges of sparse rewards, expensive interactions, and noise. An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains. We also provide a theoretical foundation and prove that the method asymptotically identifies a near-optimal subgoal design.

Related papers

Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning [16.049852176246038]
We propose Optimistic $epsilon$-Greedy Exploration, focusing on enhancing exploration to correct value estimations. We introduce an optimistic updating network to identify optimal actions and sample actions from its distribution with a probability of $epsilon$ during exploration. Experimental results in various environments reveal that the Optimistic $epsilon$-Greedy Exploration effectively prevents the algorithm from suboptimal solutions.
arXiv Detail & Related papers (2025-02-05T12:06:54Z)
Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z)
Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z)
OTO Planner: An Efficient Only Travelling Once Exploration Planner for Complex and Unknown Environments [6.128246045267511]
"Only Travelling Once Planner" is an efficient exploration planner that reduces repeated paths in complex environments. It includes fast frontier updating, viewpoint evaluation and viewpoint refinement. It reduces the exploration time and movement distance by 10%-20% and improves the speed of frontier detection by 6-9 times.
arXiv Detail & Related papers (2024-06-11T14:23:48Z)
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. We propose an efficient adaptive trajectory-constrained exploration strategy for DRL. We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z)
MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards. We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions. Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z)
Sparse Reward Exploration via Novelty Search and Emitters [55.41644538483948]
We introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm. SERENE separates the search space exploration and reward exploitation into two alternating processes. A meta-scheduler allocates a global computational budget by alternating between the two processes.
arXiv Detail & Related papers (2021-02-05T12:34:54Z)
Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z)
A Provably Efficient Sample Collection Strategy for Reinforcement Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
Exploration by Maximizing R\'enyi Entropy for Reward-Free RL Framework [28.430845498323745]
We consider a reward-free reinforcement learning framework that separates exploration from exploitation. In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment. In the planning phase, the agent computes a good policy for any reward function based on the dataset.
arXiv Detail & Related papers (2020-06-11T05:05:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.