Semantic Visual Navigation by Watching YouTube Videos
- URL: http://arxiv.org/abs/2006.10034v2
- Date: Tue, 27 Oct 2020 05:46:46 GMT
- Title: Semantic Visual Navigation by Watching YouTube Videos
- Authors: Matthew Chang, Arjun Gupta, Saurabh Gupta
- Abstract summary: This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos.
We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation.
We observe a relative improvement of 15-83% over end-to-end RL, behavior cloning, and classical methods, while using minimal direct interaction.
- Score: 17.76847333440422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic cues and statistical regularities in real-world environment layouts
can improve efficiency for navigation in novel environments. This paper learns
and leverages such semantic cues for navigating to objects of interest in novel
environments, by simply watching YouTube videos. This is challenging because
YouTube videos don't come with labels for actions or goals, and may not even
showcase optimal behavior. Our method tackles these challenges through the use
of Q-learning on pseudo-labeled transition quadruples (image, action, next
image, reward). We show that such off-policy Q-learning from passive data is
able to learn meaningful semantic cues for navigation. These cues, when used in
a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal
task in visually realistic simulations. We observe a relative improvement of
15-83% over end-to-end RL, behavior cloning, and classical methods, while using
minimal direct interaction.
Related papers
- NOLO: Navigate Only Look Once [29.242548047719787]
In this paper, we focus on the video navigation setting, where an in-context navigation policy needs to be learned purely from videos in an offline manner.
We propose Navigate Only Look Once (NOLO), a method for learning a navigation policy that possesses the in-context ability.
We show that our algorithm outperforms baselines by a large margin, which demonstrates the in-context learning ability of the learned policy.
arXiv Detail & Related papers (2024-08-02T16:41:34Z) - Interactive Semantic Map Representation for Skill-based Visual Object
Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment.
We have implemented this representation into a full-fledged navigation approach called SkillTron.
The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Navigating to Objects in the Real World [76.1517654037993]
We present a large-scale empirical study of semantic visual navigation methods comparing methods from classical, modular, and end-to-end learning approaches.
We find that modular learning works well in the real world, attaining a 90% success rate.
In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
arXiv Detail & Related papers (2022-12-02T01:10:47Z) - PONI: Potential Functions for ObjectGoal Navigation with
Interaction-free Learning [125.22462763376993]
We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI)
PONI disentangles the skills of where to look?' for an object and how to navigate to (x, y)?'
arXiv Detail & Related papers (2022-01-25T01:07:32Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - MaAST: Map Attention with Semantic Transformersfor Efficient Visual
Navigation [4.127128889779478]
This work focuses on performing better or comparable to the existing learning-based solutions for visual navigation for autonomous agents.
We propose a method to encode vital scene semantics into a semantically informed, top-down egocentric map representation.
We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-03-21T12:01:23Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - Unsupervised Domain Adaptation for Visual Navigation [115.85181329193092]
We propose an unsupervised domain adaptation method for visual navigation.
Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy.
arXiv Detail & Related papers (2020-10-27T18:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.