Related papers: Deterministic Exploration via Stationary Bellman Error Maximization

Deterministic Exploration via Stationary Bellman Error Maximization

URL: http://arxiv.org/abs/2410.23840v2
Date: Tue, 05 Nov 2024 09:52:13 GMT
Title: Deterministic Exploration via Stationary Bellman Error Maximization
Authors: Sebastian Griesbach, Carlo D'Eramo,
Abstract summary: Exploration is a crucial and distinctive aspect of reinforcement learning (RL) In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy. Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
Score: 6.474106100512158
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Exploration is a crucial and distinctive aspect of reinforcement learning (RL) that remains a fundamental open problem. Several methods have been proposed to tackle this challenge. Commonly used methods inject random noise directly into the actions, indirectly via entropy maximization, or add intrinsic rewards that encourage the agent to steer to novel regions of the state space. Another previously seen idea is to use the Bellman error as a separate optimization objective for exploration. In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy. Our separate exploration agent is informed about the state of the exploitation, thus enabling it to account for previous experiences. Further components are introduced to make the exploration objective agnostic toward the episode length and to mitigate instability introduced by far-off-policy learning. Our experimental results show that our approach can outperform $\varepsilon$-greedy in dense and sparse reward settings.

Related papers

Exploitation Is All You Need... for Exploration [0.0]
We show that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior.<n>Under the right prerequisites, exploration and exploitation need not be treated as objectives but can emerge from a unified reward-maximization process.
arXiv Detail & Related papers (2025-08-02T09:42:59Z)
Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization [6.474106100512158]
We propose a novel exploration method, Stable Error-seeking Exploration (SEE), that is robust across dense, sparse, and exploration-adverse reward settings.<n>Our method introduces three design choices to mitigate instability caused by far-off-policy learning, the conflict of interest of maximizing the cumulative TD-error in an episodic setting, and the non-stationary nature of TD-errors.
arXiv Detail & Related papers (2025-06-16T10:36:24Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Exploration by Running Away from the Past [5.062282108230929]
We consider exploration as a problem of maximizing the Shannon entropy of the state occupation measure. This is done by maximizing a sequence of divergences between distributions representing an agent's past behavior and its current behavior. We demonstrate that by encouraging the agent to explore by actively distancing itself from past experiences, it can effectively explore mazes and a wide range of behaviors on robotic manipulation and locomotion tasks.
arXiv Detail & Related papers (2024-11-21T12:51:09Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL) On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors. RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z)
When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. We refer to such exploration after a goal is reached as 'post-exploration' We introduce new methodology to adaptively decide when to post-explore and for how long to post-explore.
arXiv Detail & Related papers (2022-03-29T16:50:12Z)
Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit. The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model. We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
ADER:Adapting between Exploration and Robustness for Actor-Critic Methods [8.750251598581102]
We show that TD3's performance lags behind the vanilla actor-critic methods in some primitive environments. We propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER. Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.
arXiv Detail & Related papers (2021-09-08T05:48:39Z)
Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy. In this paper, we highlight that value estimates are easily biased and temporally inconsistent. We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon. We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z)
Intrinsic Exploration as Multi-Objective RL [29.124322674133]
Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse. We propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives. This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods.
arXiv Detail & Related papers (2020-04-06T02:37:29Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge. We present a novel approach that plans exploration actions far into the future by using a long-term visitation count. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.