Agent-Centric Relation Graph for Object Visual Navigation
- URL: http://arxiv.org/abs/2111.14422v3
- Date: Mon, 21 Aug 2023 03:13:12 GMT
- Title: Agent-Centric Relation Graph for Object Visual Navigation
- Authors: Xiaobo Hu, Youfang Lin, Shuo Wang, Zhihao Wu, Kai Lv
- Abstract summary: We introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment.
ACRG is a highly effective structure that consists of two relationships, i.e., the horizontal relationship among objects and the distance relationship between the agent and objects.
With the above graphs, the agent can perceive the environment and output navigation actions.
- Score: 25.097165101483284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object visual navigation aims to steer an agent toward a target object based
on visual observations. It is highly desirable to reasonably perceive the
environment and accurately control the agent. In the navigation task, we
introduce an Agent-Centric Relation Graph (ACRG) for learning the visual
representation based on the relationships in the environment. ACRG is a highly
effective structure that consists of two relationships, i.e., the horizontal
relationship among objects and the distance relationship between the agent and
objects . On the one hand, we design the Object Horizontal Relationship Graph
(OHRG) that stores the relative horizontal location among objects. On the other
hand, we propose the Agent-Target Distance Relationship Graph (ATDRG) that
enables the agent to perceive the distance between the target and objects. For
ATDRG, we utilize image depth to obtain the target distance and imply the
vertical location to capture the distance relationship among objects in the
vertical direction. With the above graphs, the agent can perceive the
environment and output navigation actions. Experimental results in the
artificial environment AI2-THOR demonstrate that ACRG significantly outperforms
other state-of-the-art methods in unseen testing environments.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation [11.372544701050044]
Vision-and-Language Navigation (VLN) is a challenging task where an agent is required to navigate to a natural language described location via vision observations.
The navigation abilities of the agent can be enhanced by the relations between objects, which are usually learned using internal objects or external datasets.
arXiv Detail & Related papers (2024-03-23T02:44:43Z) - Building Category Graphs Representation with Spatial and Temporal
Attention for Visual Navigation [35.13932194789583]
Given an object of interest, visual navigation aims to reach the object's location based on a sequence of partial observations.
To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment.
We propose a Category Relation Graph (CRG) to learn the knowledge of object category layout relations and a Temporal-Spatial-Region (TSR) attention architecture to perceive the long-term spatial-temporal dependencies of objects helping the
arXiv Detail & Related papers (2023-12-06T07:28:43Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - Task-Driven Graph Attention for Hierarchical Relational Object
Navigation [25.571175038938527]
Embodied AI agents in large scenes often need to navigate to find objects.
We study a naturally emerging variant of the object navigation task, hierarchical object navigation (HRON)
We propose a solution that uses scene graphs as part of its input and integrates graph neural networks as its backbone.
arXiv Detail & Related papers (2023-06-23T19:50:48Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Language and Visual Entity Relationship Graph for Agent Navigation [54.059606864535304]
Vision-and-Language Navigation (VLN) requires an agent to navigate in a real-world environment following natural language instructions.
We propose a novel Language and Visual Entity Relationship Graph for modelling the inter-modal relationships between text and vision.
Experiments show that by taking advantage of the relationships we are able to improve over state-of-the-art.
arXiv Detail & Related papers (2020-10-19T08:25:55Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.