Curiosity & Entropy Driven Unsupervised RL in Multiple Environments
- URL: http://arxiv.org/abs/2401.04198v1
- Date: Mon, 8 Jan 2024 19:25:40 GMT
- Title: Curiosity & Entropy Driven Unsupervised RL in Multiple Environments
- Authors: Shaurya Dewan, Anisha Jain, Zoe LaLena, Lifan Yu
- Abstract summary: We propose and experiment with five new modifications to the original work.
In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more.
However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The authors of 'Unsupervised Reinforcement Learning in Multiple environments'
propose a method, alpha-MEPOL, to tackle unsupervised RL across multiple
environments. They pre-train a task-agnostic exploration policy using
interactions from an entire environment class and then fine-tune this policy
for various tasks using supervision. We expanded upon this work, with the goal
of improving performance. We primarily propose and experiment with five new
modifications to the original work: sampling trajectories using an
entropy-based probability distribution, dynamic alpha, higher KL Divergence
threshold, curiosity-driven exploration, and alpha-percentile sampling on
curiosity. Dynamic alpha and higher KL-Divergence threshold both provided a
significant improvement over the baseline from the earlier work. PDF-sampling
failed to provide any improvement due to it being approximately equivalent to
the baseline method when the sample space is small. In high-dimensional
environments, the addition of curiosity-driven exploration enhances learning by
encouraging the agent to seek diverse experiences and explore the unknown more.
However, its benefits are limited in low-dimensional and simpler environments
where exploration possibilities are constrained and there is little that is
truly unknown to the agent. Overall, some of our experiments did boost
performance over the baseline and there are a few directions that seem
promising for further research.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Zipfian environments for Reinforcement Learning [19.309119596790563]
We show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
We develop three complementary RL environments where the agent's experience varies according to a Zipfian (discrete power law) distribution.
Our results show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
arXiv Detail & Related papers (2022-03-15T19:59:10Z) - MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement
Learning and Procedurally Generated Environments [0.7742297876120561]
MarsExplorer is an openai-gym compatible environment tailored to exploration/coverage of unknown areas.
It translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle.
Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment.
arXiv Detail & Related papers (2021-07-21T10:29:39Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.