Related papers: RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models

RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models

URL: http://arxiv.org/abs/2506.02354v1
Date: Tue, 03 Jun 2025 01:15:00 GMT
Title: RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
Authors: Junjie Li, Nan Zhang, Xiaoyang Qu, Kai Lu, Guokuan Li, Jiguang Wan, Jianzong Wang,
Abstract summary: A critical but underexplored direction is the timely termination of exploration to overcome these challenges.<n>We propose RATE-Nav, a Region-Aware Termination-Enhanced method.<n>It includes a geometric predictive region segmentation algorithm and region-Based exploration estimation algorithm for exploration rate calculation.<n>It achieves a success rate of 67.8% and an SPL of 31.3% on the HM3D dataset.
Score: 36.39389224168802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object Navigation (ObjectNav) is a fundamental task in embodied artificial intelligence. Although significant progress has been made in semantic map construction and target direction prediction in current research, redundant exploration and exploration failures remain inevitable. A critical but underexplored direction is the timely termination of exploration to overcome these challenges. We observe a diminishing marginal effect between exploration steps and exploration rates and analyze the cost-benefit relationship of exploration. Inspired by this, we propose RATE-Nav, a Region-Aware Termination-Enhanced method. It includes a geometric predictive region segmentation algorithm and region-Based exploration estimation algorithm for exploration rate calculation. By leveraging the visual question answering capabilities of visual language models (VLMs) and exploration rates enables efficient termination.RATE-Nav achieves a success rate of 67.8% and an SPL of 31.3% on the HM3D dataset. And on the more challenging MP3D dataset, RATE-Nav shows approximately 10% improvement over previous zero-shot methods.

Related papers

ForesightNav: Learning Scene Imagination for Efficient Exploration [57.49417653636244]
We propose ForesightNav, a novel exploration strategy inspired by human imagination and reasoning.<n>Our approach equips robotic agents with the capability to predict contextual information, such as occupancy and semantic details, for unexplored regions.<n>We validate our imagination-based approach using the Structured3D dataset, demonstrating accurate occupancy prediction and superior performance in anticipating unseen scene geometry.
arXiv Detail & Related papers (2025-04-22T17:38:38Z)
FrontierNet: Learning Visual Cues to Explore [54.8265603996238]
This work aims at leveraging 2D visual cues for efficient autonomous exploration, addressing the limitations of extracting goal poses from a 3D map.<n>We propose a visual-only frontier-based exploration system, with FrontierNet as its core component.<n>Our approach provides an alternative to existing 3D-dependent goal-extraction approaches, achieving a 15% improvement in early-stage exploration efficiency.
arXiv Detail & Related papers (2025-01-08T16:25:32Z)
IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation [33.979481250363584]
This paper introduces a novel informative path planning and 3D object probability mapping approach. The mapping module computes the probability of the object of interest through semantic segmentation and a Bayes filter. Although our planner follows a zero-shot approach, it achieves state-of-the-art performance as measured by the Success weighted by Path Length (SPL) and Soft SPL in the Habitat ObjectNav Challenge 2023.
arXiv Detail & Related papers (2024-10-25T17:11:33Z)
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model [28.79971953667143]
VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time. By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
arXiv Detail & Related papers (2024-01-05T08:05:07Z)
Learning-Augmented Model-Based Planning for Visual Exploration [8.870188183999854]
We propose a novel exploration approach using learning-augmented model-based planning. Visual sensing and advances in semantic mapping of indoor scenes are exploited. Our approach surpasses the greedy strategies by 2.1% and the RL-based exploration methods by 8.4% in terms of coverage.
arXiv Detail & Related papers (2022-11-15T04:53:35Z)
SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots. Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z)
Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment. Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment. This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.