VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language
Model
- URL: http://arxiv.org/abs/2401.02695v2
- Date: Tue, 6 Feb 2024 05:15:20 GMT
- Title: VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language
Model
- Authors: Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang,
Chang Liu
- Abstract summary: VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time.
By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
- Score: 28.79971953667143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of household robotics, the Zero-Shot Object Navigation (ZSON)
task empowers agents to adeptly traverse unfamiliar environments and locate
objects from novel categories without prior explicit training. This paper
introduces VoroNav, a novel semantic exploration framework that proposes the
Reduced Voronoi Graph to extract exploratory paths and planning nodes from a
semantic map constructed in real time. By harnessing topological and semantic
information, VoroNav designs text-based descriptions of paths and images that
are readily interpretable by a large language model (LLM). In particular, our
approach presents a synergy of path and farsight descriptions to represent the
environmental context, enabling LLM to apply commonsense reasoning to ascertain
waypoints for navigation. Extensive evaluation on HM3D and HSSD validates
VoroNav surpasses existing benchmarks in both success rate and exploration
efficiency (absolute improvement: +2.8% Success and +3.7% SPL on HM3D, +2.6%
Success and +3.8% SPL on HSSD). Additionally introduced metrics that evaluate
obstacle avoidance proficiency and perceptual efficiency further corroborate
the enhancements achieved by our method in ZSON planning. Project page:
https://voro-nav.github.io
Related papers
- TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation [34.85111360243636]
We introduce TopV-Nav, a MLLM-based method that directly reasons on the top-view map with complete spatial information.
To fully unlock the MLLM's spatial reasoning potential in top-view perspective, we propose the Adaptive Visual Prompt Generation (AVPG) method.
Also, we design a Dynamic Map Scaling (DMS) mechanism to dynamically zoom top-view map at preferred scales.
arXiv Detail & Related papers (2024-11-25T14:27:55Z) - IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation [33.979481250363584]
This paper introduces a novel informative path planning and 3D object probability mapping approach.
The mapping module computes the probability of the object of interest through semantic segmentation and a Bayes filter.
Although our planner follows a zero-shot approach, it achieves state-of-the-art performance as measured by the Success weighted by Path Length (SPL) and Soft SPL in the Habitat ObjectNav Challenge 2023.
arXiv Detail & Related papers (2024-10-25T17:11:33Z) - SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation [83.4599149936183]
Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects.
We propose to represent the observed scene with 3D scene graph.
We conduct extensive experiments on MP3D, HM3D and RoboTHOR environments, where SG-Nav surpasses previous state-of-the-art zero-shot methods by more than 10% SR on all benchmarks.
arXiv Detail & Related papers (2024-10-10T17:57:19Z) - Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [64.84996994779443]
We propose a novel Affordances-Oriented Planner for continuous vision-language navigation (VLN) task.
Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making.
Experiments on the challenging R2R-CE and RxR-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-07-08T12:52:46Z) - GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
Our framework constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration
for Zero-Shot Object Navigation [58.3480730643517]
We present LGX, a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON)
Our approach makes use of Large Language Models (LLMs) for this task.
We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline.
arXiv Detail & Related papers (2023-03-06T20:19:19Z) - PEANUT: Predicting and Navigating to Unseen Targets [18.87376347895365]
Efficient ObjectGoal navigation (ObjectNav) in novel environments requires an understanding of the spatial and semantic regularities in environment layouts.
We present a method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps.
Our prediction model is lightweight and can be trained in a supervised manner using a relatively small amount of passively collected data.
arXiv Detail & Related papers (2022-12-05T18:58:58Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.