Focus on Impact: Indoor Exploration with Intrinsic Motivation
- URL: http://arxiv.org/abs/2109.08521v1
- Date: Tue, 14 Sep 2021 18:00:07 GMT
- Title: Focus on Impact: Indoor Exploration with Intrinsic Motivation
- Authors: Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi,
Marcella Cornia and Rita Cucchiara
- Abstract summary: In this work, we propose to train a model with a purely intrinsic reward signal to guide exploration.
We include a neural-based density model and replace the traditional count-based regularization with an estimated pseudo-count of previously visited states.
We also show that a robot equipped with the proposed approach seamlessly adapts to point-goal navigation and real-world deployment.
- Score: 45.97756658635314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration of indoor environments has recently experienced a significant
interest, also thanks to the introduction of deep neural agents built in a
hierarchical fashion and trained with Deep Reinforcement Learning (DRL) on
simulated environments. Current state-of-the-art methods employ a dense
extrinsic reward that requires the complete a priori knowledge of the layout of
the training environment to learn an effective exploration policy. However,
such information is expensive to gather in terms of time and resources. In this
work, we propose to train the model with a purely intrinsic reward signal to
guide exploration, which is based on the impact of the robot's actions on the
environment. So far, impact-based rewards have been employed for simple tasks
and in procedurally generated synthetic environments with countable states.
Since the number of states observable by the agent in realistic indoor
environments is non-countable, we include a neural-based density model and
replace the traditional count-based regularization with an estimated
pseudo-count of previously visited states. The proposed exploration approach
outperforms DRL-based competitors relying on intrinsic rewards and surpasses
the agents trained with a dense extrinsic reward computed with the environment
layouts. We also show that a robot equipped with the proposed approach
seamlessly adapts to point-goal navigation and real-world deployment.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information.
We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Active Exploration for Inverse Reinforcement Learning [58.295273181096036]
We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL)
AceIRL actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy.
We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.
arXiv Detail & Related papers (2022-07-18T14:45:55Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.