Task-Driven Graph Attention for Hierarchical Relational Object
Navigation
- URL: http://arxiv.org/abs/2306.13760v1
- Date: Fri, 23 Jun 2023 19:50:48 GMT
- Title: Task-Driven Graph Attention for Hierarchical Relational Object
Navigation
- Authors: Michael Lingelbach, Chengshu Li, Minjune Hwang, Andrey Kurenkov, Alan
Lou, Roberto Mart\'in-Mart\'in, Ruohan Zhang, Li Fei-Fei, Jiajun Wu
- Abstract summary: Embodied AI agents in large scenes often need to navigate to find objects.
We study a naturally emerging variant of the object navigation task, hierarchical object navigation (HRON)
We propose a solution that uses scene graphs as part of its input and integrates graph neural networks as its backbone.
- Score: 25.571175038938527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied AI agents in large scenes often need to navigate to find objects. In
this work, we study a naturally emerging variant of the object navigation task,
hierarchical relational object navigation (HRON), where the goal is to find
objects specified by logical predicates organized in a hierarchical structure -
objects related to furniture and then to rooms - such as finding an apple on
top of a table in the kitchen. Solving such a task requires an efficient
representation to reason about object relations and correlate the relations in
the environment and in the task goal. HRON in large scenes (e.g. homes) is
particularly challenging due to its partial observability and long horizon,
which invites solutions that can compactly store the past information while
effectively exploring the scene. We demonstrate experimentally that scene
graphs are the best-suited representation compared to conventional
representations such as images or 2D maps. We propose a solution that uses
scene graphs as part of its input and integrates graph neural networks as its
backbone, with an integrated task-driven attention mechanism, and demonstrate
its better scalability and learning efficiency than state-of-the-art baselines.
Related papers
- On Support Relations Inference and Scene Hierarchy Graph Construction from Point Cloud in Clustered Environments [3.4535508414601344]
In 3D scenes, rich spatial geometric and topological information are often ignored by RGB-based approaches for scene understanding.
In this study, we develop a bottom-up approach for scene understanding that infers support relations between objects from a point cloud.
arXiv Detail & Related papers (2024-04-22T02:42:32Z) - Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [16.32780793344835]
We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation.
Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception.
The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
arXiv Detail & Related papers (2024-02-29T06:31:18Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Agent-Centric Relation Graph for Object Visual Navigation [25.097165101483284]
We introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment.
ACRG is a highly effective structure that consists of two relationships, i.e., the horizontal relationship among objects and the distance relationship between the agent and objects.
With the above graphs, the agent can perceive the environment and output navigation actions.
arXiv Detail & Related papers (2021-11-29T10:06:31Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Semantic and Geometric Modeling with Neural Message Passing in 3D Scene
Graphs for Hierarchical Mechanical Search [48.655167907740136]
We use a 3D scene graph representation to capture the hierarchical, semantic, and geometric aspects of this problem.
We introduce Hierarchical Mechanical Search (HMS), a method that guides an agent's actions towards finding a target object specified with a natural language description.
HMS is evaluated on a novel dataset of 500 3D scene graphs with dense placements of semantically related objects in storage locations.
arXiv Detail & Related papers (2020-12-07T21:04:34Z) - Exploiting Scene-specific Features for Object Goal Navigation [9.806910643086043]
We introduce a new reduced dataset that speeds up the training of navigation models.
Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times.
We propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them.
arXiv Detail & Related papers (2020-08-21T10:16:01Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.