Learning Object Relation Graph and Tentative Policy for Visual
Navigation
- URL: http://arxiv.org/abs/2007.11018v1
- Date: Tue, 21 Jul 2020 18:03:05 GMT
- Title: Learning Object Relation Graph and Tentative Policy for Visual
Navigation
- Authors: Heming Du, Xin Yu, Liang Zheng
- Abstract summary: It is critical to learn informative visual representation and robust navigation policy.
This paper proposes three complementary techniques, object relation graph (ORG), trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN)
We report 22.8% and 23.5% increase in success rate and Success weighted by Path Length (SPL)
- Score: 44.247995617796484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Target-driven visual navigation aims at navigating an agent towards a given
target based on the observation of the agent. In this task, it is critical to
learn informative visual representation and robust navigation policy. Aiming to
improve these two components, this paper proposes three complementary
techniques, object relation graph (ORG), trial-driven imitation learning (IL),
and a memory-augmented tentative policy network (TPN). ORG improves visual
representation learning by integrating object relationships, including category
closeness and spatial correlations, e.g., a TV usually co-occurs with a remote
spatially. Both Trial-driven IL and TPN underlie robust navigation policy,
instructing the agent to escape from deadlock states, such as looping or being
stuck. Specifically, trial-driven IL is a type of supervision used in policy
network training, while TPN, mimicking the IL supervision in unseen
environment, is applied in testing. Experiment in the artificial environment
AI2-Thor validates that each of the techniques is effective. When combined, the
techniques bring significantly improvement over baseline methods in navigation
effectiveness and efficiency in unseen environments. We report 22.8% and 23.5%
increase in success rate and Success weighted by Path Length (SPL),
respectively. The code is available at
https://github.com/xiaobaishu0097/ECCV-VN.git.
Related papers
- DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors [13.700885996266457]
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents.
We present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents.
Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods.
arXiv Detail & Related papers (2024-09-26T23:07:01Z) - Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [16.32780793344835]
We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation.
Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception.
The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
arXiv Detail & Related papers (2024-02-29T06:31:18Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - Contrastive Instruction-Trajectory Learning for Vision-Language
Navigation [66.16980504844233]
A vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
Previous works fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions.
We propose a Contrastive Instruction-Trajectory Learning framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation.
arXiv Detail & Related papers (2021-12-08T06:32:52Z) - Learning to Explore by Reinforcement over High-Level Options [0.0]
We propose a new method which grants an agent two intertwined options of behaviors: "look-around" and "frontier navigation"
In each timestep, an agent produces an option and a corresponding action according to the policy.
We demonstrate the effectiveness of the proposed method on two publicly available 3D environment datasets.
arXiv Detail & Related papers (2021-11-02T04:21:34Z) - Semantic Tracklets: An Object-Centric Representation for Visual
Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation.
We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment.
Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z) - Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning.
Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples.
Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.