MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement
Learning and Procedurally Generated Environments
- URL: http://arxiv.org/abs/2107.09996v1
- Date: Wed, 21 Jul 2021 10:29:39 GMT
- Title: MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement
Learning and Procedurally Generated Environments
- Authors: Dimitrios I. Koutras, Athanasios Ch. Kapoutsis, Angelos A.
Amanatiadis, Elias B. Kosmatopoulos
- Abstract summary: MarsExplorer is an openai-gym compatible environment tailored to exploration/coverage of unknown areas.
It translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle.
Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment.
- Score: 0.7742297876120561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is an initial endeavor to bridge the gap between powerful Deep
Reinforcement Learning methodologies and the problem of exploration/coverage of
unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible
environment tailored to exploration/coverage of unknown areas, is presented.
MarsExplorer translates the original robotics problem into a Reinforcement
Learning setup that various off-the-shelf algorithms can tackle. Any learned
policy can be straightforwardly applied to a robotic platform without an
elaborate simulation model of the robot's dynamics to apply a different
learning/adaptation phase. One of its core features is the controllable
multi-dimensional procedural generation of terrains, which is the key for
producing policies with strong generalization capabilities. Four different
state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the
MarsExplorer environment, and a proper evaluation of their results compared to
the average human-level performance is reported. In the follow-up experimental
analysis, the effect of the multi-dimensional difficulty setting on the
learning capabilities of the best-performing algorithm (PPO) is analyzed. A
milestone result is the generation of an exploration policy that follows the
Hilbert curve without providing this information to the environment or
rewarding directly or indirectly Hilbert-curve-like trajectories. The
experimental analysis is concluded by comparing PPO learned policy results with
frontier-based exploration context for extended terrain sizes. The source code
can be found at: https://github.com/dimikout3/GeneralExplorationPolicy.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work.
In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more.
However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z) - ReProHRL: Towards Multi-Goal Navigation in the Real World using
Hierarchical Agents [1.3194749469702445]
We present Ready for Production Hierarchical RL (ReProHRL) that divides tasks with hierarchical multi-goal navigation guided by reinforcement learning.
We also use object detectors as a pre-processing step to learn multi-goal navigation and transfer it to the real world.
For the real-world implementation and proof of concept demonstration, we deploy the proposed method on a nano-drone named Crazyflie with a front camera.
arXiv Detail & Related papers (2023-08-17T02:23:59Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Deep Reinforcement Learning for Adaptive Exploration of Unknown
Environments [6.90777229452271]
We develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs.
The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps.
The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
arXiv Detail & Related papers (2021-05-04T16:29:44Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.