Learning for Visual Navigation by Imagining the Success
- URL: http://arxiv.org/abs/2103.00446v1
- Date: Sun, 28 Feb 2021 10:25:46 GMT
- Title: Learning for Visual Navigation by Imagining the Success
- Authors: Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Qi Wu, Javen Shi and Anton
Van Den Hengel
- Abstract summary: We propose to learn to imagine a latent representation of the successful (sub-)goal state.
ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success.
We develop an efficient learning algorithm to train ForeSIT in an on-policy manner and integrate it into our RL objective.
- Score: 66.99810227193196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual navigation is often cast as a reinforcement learning (RL) problem.
Current methods typically result in a suboptimal policy that learns general
obstacle avoidance and search behaviours. For example, in the target-object
navigation setting, the policies learnt by traditional methods often fail to
complete the task, even when the target is clearly within reach from a human
perspective. In order to address this issue, we propose to learn to imagine a
latent representation of the successful (sub-)goal state. To do so, we have
developed a module which we call Foresight Imagination (ForeSIT). ForeSIT is
trained to imagine the recurrent latent representation of a future state that
leads to success, e.g. either a sub-goal state that is important to reach
before the target, or the goal state itself. By conditioning the policy on the
generated imagination during training, our agent learns how to use this
imagination to achieve its goal robustly. Our agent is able to imagine what the
(sub-)goal state may look like (in the latent space) and can learn to navigate
towards that state. We develop an efficient learning algorithm to train ForeSIT
in an on-policy manner and integrate it into our RL objective. The integration
is not trivial due to the constantly evolving state representation shared
between both the imagination and the policy. We, empirically, observe that our
method outperforms the state-of-the-art methods by a large margin in the
commonly accepted benchmark AI2THOR environment. Our method can be readily
integrated or added to other model-free RL navigation frameworks.
Related papers
- Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL [19.757030674041037]
Embodied visual tracking is a vital and challenging skill for embodied agents.
Existing methods suffer from inefficient training and poor generalization.
We propose a novel framework that combines visual foundation models and offline reinforcement learning.
arXiv Detail & Related papers (2024-04-15T15:12:53Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning [71.52722621691365]
Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems.
We propose a new form of state abstraction called goal-conditioned bisimulation.
We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks.
arXiv Detail & Related papers (2022-04-27T17:00:11Z) - Zero Experience Required: Plug & Play Modular Transfer Learning for
Semantic Visual Navigation [97.17517060585875]
We present a unified approach to visual navigation using a novel modular transfer learning model.
Our model can effectively leverage its experience from one source task and apply it to multiple target tasks.
Our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
arXiv Detail & Related papers (2022-02-05T00:07:21Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.