Zero-shot object goal visual navigation
- URL: http://arxiv.org/abs/2206.07423v1
- Date: Wed, 15 Jun 2022 09:53:43 GMT
- Title: Zero-shot object goal visual navigation
- Authors: Qianfan Zhao, Lu Zhang, Bin He, Hong Qiao, and Zhiyong Liu
- Abstract summary: In real households, there may exist numerous object classes that the robot needs to deal with.
We propose a zero-shot object navigation task by combining zero-shot learning with object goal visual navigation.
Our model outperforms the baseline models in both seen and unseen classes.
- Score: 15.149900666249096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object goal visual navigation is a challenging task that aims to guide a
robot to find the target object only based on its visual observation, and the
target is limited to the classes specified in the training stage. However, in
real households, there may exist numerous object classes that the robot needs
to deal with, and it is hard for all of these classes to be contained in the
training stage. To address this challenge, we propose a zero-shot object
navigation task by combining zero-shot learning with object goal visual
navigation, which aims at guiding robots to find objects belonging to novel
classes without any training samples. This task gives rise to the need to
generalize the learned policy to novel classes, which is a less addressed issue
of object navigation using deep reinforcement learning. To address this issue,
we utilize "class-unrelated" data as input to alleviate the overfitting of the
classes specified in the training stage. The class-unrelated input consists of
detection results and cosine similarity of word embeddings, and does not
contain any class-related visual features or knowledge graphs. Extensive
experiments on the AI2-THOR platform show that our model outperforms the
baseline models in both seen and unseen classes, which proves that our model is
less class-sensitive and generalizes better. Our code is available at
https://github.com/pioneer-innovation/Zero-Shot-Object-Navigation
Related papers
- Language-Based Augmentation to Address Shortcut Learning in Object Goal
Navigation [0.0]
We aim to deepen our understanding of shortcut learning in ObjectNav.
We observe poor generalization of a state-of-the-art (SOTA) ObjectNav method to environments where this is not the case.
We find that shortcut learning is the root cause: the agent learns to navigate to target objects, by simply searching for the associated wall color of the target object's room.
arXiv Detail & Related papers (2024-02-07T18:44:27Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - OVTrack: Open-Vocabulary Multiple Object Tracking [64.73379741435255]
OVTrack is an open-vocabulary tracker capable of tracking arbitrary object classes.
It sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark.
arXiv Detail & Related papers (2023-04-17T16:20:05Z) - Is an Object-Centric Video Representation Beneficial for Transfer? [86.40870804449737]
We introduce a new object-centric video recognition model on a transformer architecture.
We show that the object-centric model outperforms prior video representations.
arXiv Detail & Related papers (2022-07-20T17:59:44Z) - Zero Experience Required: Plug & Play Modular Transfer Learning for
Semantic Visual Navigation [97.17517060585875]
We present a unified approach to visual navigation using a novel modular transfer learning model.
Our model can effectively leverage its experience from one source task and apply it to multiple target tasks.
Our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
arXiv Detail & Related papers (2022-02-05T00:07:21Z) - Task-Focused Few-Shot Object Detection for Robot Manipulation [1.8275108630751844]
We develop a manipulation method based solely on detection then introduce task-focused few-shot object detection to learn new objects and settings.
In experiments for our interactive approach to few-shot learning, we train a robot to manipulate objects directly from detection (ClickBot)
arXiv Detail & Related papers (2022-01-28T21:52:05Z) - Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component.
It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Exploiting Scene-specific Features for Object Goal Navigation [9.806910643086043]
We introduce a new reduced dataset that speeds up the training of navigation models.
Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times.
We propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them.
arXiv Detail & Related papers (2020-08-21T10:16:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.