Related papers: Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning

Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2406.08069v3
Date: Wed, 18 Sep 2024 10:04:56 GMT
Title: Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning
Authors: Max Weltevrede, Felix Kaubek, Matthijs T. J. Spaan, Wendelin Böhmer,
Abstract summary: We show that increased exploration during training can be leveraged to increase the generalisation performance of the agent. We propose a novel method Explore-Go that exploits this intuition by increasing the number of states on which the agent trains.
Score: 5.624791703748109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the remaining challenges in reinforcement learning is to develop agents that can generalise to novel scenarios they might encounter once deployed. This challenge is often framed in a multi-task setting where agents train on a fixed set of tasks and have to generalise to new tasks. Recent work has shown that in this setting increased exploration during training can be leveraged to increase the generalisation performance of the agent. This makes sense when the states encountered during testing can actually be explored during training. In this paper, we provide intuition why exploration can also benefit generalisation to states that cannot be explicitly encountered during training. Additionally, we propose a novel method Explore-Go that exploits this intuition by increasing the number of states on which the agent trains. Explore-Go effectively increases the starting state distribution of the agent and as a result can be used in conjunction with most existing on-policy or off-policy reinforcement learning algorithms. We show empirically that our method can increase generalisation performance in an illustrative environment and on the Procgen benchmark.

Related papers

Training on more Reachable Tasks for Generalisation in Reinforcement Learning [5.855552389030083]
In multi-task reinforcement learning, agents train on a fixed set of tasks and have to generalise to new ones. Recent work has shown that increased exploration improves this generalisation, but it remains unclear why exactly that is. We introduce the concept of reachability in multi-task reinforcement learning and show that an initial exploration phase increases the number of reachable tasks the agent is trained on.
arXiv Detail & Related papers (2024-10-04T16:15:31Z)
Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery [5.680463564655267]
Rule-driven deep Q-learning agent (RDQ) as one possible implementation of framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world. In experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents.
arXiv Detail & Related papers (2023-11-24T04:12:50Z)
On the Importance of Exploration for Generalization in Reinforcement Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty. Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z)
Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
arXiv Detail & Related papers (2022-05-16T14:30:11Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.