On the Importance of Exploration for Generalization in Reinforcement
Learning
- URL: http://arxiv.org/abs/2306.05483v1
- Date: Thu, 8 Jun 2023 18:07:02 GMT
- Title: On the Importance of Exploration for Generalization in Reinforcement
Learning
- Authors: Yiding Jiang, J. Zico Kolter, Roberta Raileanu
- Abstract summary: We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
- Score: 89.63074327328765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing approaches for improving generalization in deep reinforcement
learning (RL) have mostly focused on representation learning, neglecting
RL-specific aspects such as exploration. We hypothesize that the agent's
exploration strategy plays a key role in its ability to generalize to new
environments. Through a series of experiments in a tabular contextual MDP, we
show that exploration is helpful not only for efficiently finding the optimal
policy for the training environments but also for acquiring knowledge that
helps decision making in unseen environments. Based on these observations, we
propose EDE: Exploration via Distributional Ensemble, a method that encourages
exploration of states with high epistemic uncertainty through an ensemble of
Q-value distributions. Our algorithm is the first value-based approach to
achieve state-of-the-art on both Procgen and Crafter, two benchmarks for
generalization in RL with high-dimensional observations. The open-sourced
implementation can be found at https://github.com/facebookresearch/ede .
Related papers
- Search Inspired Exploration in Reinforcement Learning [5.411688702405822]
We propose a novel method that actively guides exploration by setting sub-goals based on the agent's learning progress.<n>Inspired by search, sub-goals are prioritized from the frontier based on estimates of cost-to-come and cost-to-go.<n>In experiments on challenging sparse-reward environments, SIERL outperforms dominant baselines in both achieving the main task goal and generalizing to reach arbitrary states in the environment.
arXiv Detail & Related papers (2026-01-31T02:24:22Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Diversity-Incentivized Exploration for Versatile Reasoning [63.653348177250756]
We propose textbfDIVER (textbfDi-textbfIncentivized Exploration for textbfVersatiltextbfE textbfReasoning), an innovative framework that highlights the pivotal role of global sequence-level diversity to incentivize deep exploration for versatile reasoning.
arXiv Detail & Related papers (2025-09-30T13:11:46Z) - From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR [92.51110344832178]
Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs)<n>This technical report presents a systematic investigation of exploration capacities in RLVR, covering four main aspects.
arXiv Detail & Related papers (2025-08-11T01:26:16Z) - On Efficient Bayesian Exploration in Model-Based Reinforcement Learning [0.24578723416255752]
We address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation.<n>We prove that exploration bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment's dynamics and rewards.<n>We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration.
arXiv Detail & Related papers (2025-07-03T14:03:47Z) - Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm [0.195804735329484]
Reinforcement learning (RL) and Deep Reinforcement Learning (DRL) have the potential to disrupt and are already changing the way we interact with the world.
One of the key indicators of their applicability is their ability to scale and work in real-world scenarios.
arXiv Detail & Related papers (2024-08-19T14:50:48Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - DEIR: Efficient and Robust Exploration through
Discriminative-Model-Based Episodic Intrinsic Rewards [2.09711130126031]
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms.
Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations.
We propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term.
arXiv Detail & Related papers (2023-04-21T06:39:38Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Case-based Reasoning for Better Generalization in Text-Adventure Games [15.652823459179048]
We propose a general method inspired by case-based reasoning to train agents and generalize out of the training distribution.
Our experiments show that the proposed approach consistently improves existing methods, obtains good out-of-distribution generalization, and achieves new state-of-the-art results on widely used environments.
arXiv Detail & Related papers (2021-10-16T04:51:34Z) - Exploration in Deep Reinforcement Learning: A Comprehensive Survey [24.252352133705735]
Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance.
DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings.
This paper provides a comprehensive survey on existing exploration methods in DRL and deep MARL.
arXiv Detail & Related papers (2021-09-14T13:16:33Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Intrinsic Exploration as Multi-Objective RL [29.124322674133]
Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse.
We propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives.
This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods.
arXiv Detail & Related papers (2020-04-06T02:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.