UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
- URL: http://arxiv.org/abs/2503.10630v3
- Date: Tue, 18 Mar 2025 10:07:07 GMT
- Title: UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
- Authors: Hang Yin, Xiuwei Xu, Lingqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu,
- Abstract summary: We propose a general framework for universal zero-shot goal-oriented navigation.<n>We propose a uniform graph representation to unify different goals, including object category, instance image and text description.<n>Our UniGoal achieves state-of-the-art zero-shot performance on three studied navigation tasks with a single model.
- Score: 68.45058159533376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a general framework for universal zero-shot goal-oriented navigation. Existing zero-shot methods build inference framework upon large language models (LLM) for specific tasks, which differs a lot in overall pipeline and fails to generalize across different types of goal. Towards the aim of universal zero-shot navigation, we propose a uniform graph representation to unify different goals, including object category, instance image and text description. We also convert the observation of agent into an online maintained scene graph. With this consistent scene and goal representation, we preserve most structural information compared with pure text and are able to leverage LLM for explicit graph-based reasoning. Specifically, we conduct graph matching between the scene graph and goal graph at each time instant and propose different strategies to generate long-term goal of exploration according to different matching states. The agent first iteratively searches subgraph of goal when zero-matched. With partial matching, the agent then utilizes coordinate projection and anchor pair alignment to infer the goal location. Finally scene graph correction and goal verification are applied for perfect matching. We also present a blacklist mechanism to enable robust switch between stages. Extensive experiments on several benchmarks show that our UniGoal achieves state-of-the-art zero-shot performance on three studied navigation tasks with a single model, even outperforming task-specific zero-shot methods and supervised universal methods.
Related papers
- Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [16.32780793344835]
We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation.
Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception.
The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
arXiv Detail & Related papers (2024-02-29T06:31:18Z) - ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models [77.84408427496025]
State-of-the-art 3D point cloud registration methods rely on labeled 3D datasets for training.<n>We introduce ZeroReg, a zero-shot registration approach that utilizes 2D foundation models to predict 3D correspondences.
arXiv Detail & Related papers (2023-12-05T11:33:16Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - ReVoLT: Relational Reasoning and Voronoi Local Graph Planning for
Target-driven Navigation [1.0896567381206714]
Embodied AI is an inevitable trend that emphasizes the interaction between intelligent entities and the real world.
Recent works focus on exploiting layout relationships by graph neural networks (GNNs)
We decouple this task and propose ReVoLT, a hierarchical framework.
arXiv Detail & Related papers (2023-01-06T05:19:56Z) - GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization.
We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph.
Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z) - Self-supervised Graph-level Representation Learning with Local and
Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning.
Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters.
An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z) - Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking.
Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.