LGR: LLM-Guided Ranking of Frontiers for Object Goal Navigation
- URL: http://arxiv.org/abs/2503.20241v1
- Date: Wed, 26 Mar 2025 05:15:26 GMT
- Title: LGR: LLM-Guided Ranking of Frontiers for Object Goal Navigation
- Authors: Mitsuaki Uno, Kanji Tanaka, Daiki Iwata, Yudai Noda, Shoya Miyazaki, Kouki Terashima,
- Abstract summary: This study aims to enhance recent modular mapless OGN systems by leveraging the commonsense reasoning capabilities of large language models (LLMs)<n>We address the challenge of determining the visiting order in frontier-based exploration by framing it as a frontier ranking problem.<n>Our approach is grounded in recent findings that, while LLMs cannot determine the absolute value of a frontier, they excel at evaluating the relative value between multiple frontiers viewed within a single image using the view image as context.
- Score: 1.1874952582465603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object Goal Navigation (OGN) is a fundamental task for robots and AI, with key applications such as mobile robot image databases (MRID). In particular, mapless OGN is essential in scenarios involving unknown or dynamic environments. This study aims to enhance recent modular mapless OGN systems by leveraging the commonsense reasoning capabilities of large language models (LLMs). Specifically, we address the challenge of determining the visiting order in frontier-based exploration by framing it as a frontier ranking problem. Our approach is grounded in recent findings that, while LLMs cannot determine the absolute value of a frontier, they excel at evaluating the relative value between multiple frontiers viewed within a single image using the view image as context. We dynamically manage the frontier list by adding and removing elements, using an LLM as a ranking model. The ranking results are represented as reciprocal rank vectors, which are ideal for multi-view, multi-query information fusion. We validate the effectiveness of our method through evaluations in Habitat-Sim.
Related papers
- History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation [5.343932820859596]
This paper introduces a novel zero-shot ObjectNav framework that pioneers the use of dynamic, history-aware prompting.<n>Our core innovation lies in providing the VLM with action history context, enabling it to generate semantic guidance scores for navigation actions.<n>We also introduce a VLM-assisted waypoint generation mechanism for refining the final approach to detected objects.
arXiv Detail & Related papers (2025-06-19T21:50:16Z) - TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation [3.2688425993442696]
We propose a modular approach for Vision-Language Navigation (VLN)
We use state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs) in a zero-shot setting.
We demonstrate superior performance compared to other approaches that use joint semantic maps.
arXiv Detail & Related papers (2025-02-11T07:09:37Z) - FrontierNet: Learning Visual Cues to Explore [54.8265603996238]
This work aims at leveraging 2D visual cues for efficient autonomous exploration, addressing the limitations of extracting goal poses from a 3D map.<n>We propose a visual-only frontier-based exploration system, with FrontierNet as its core component.<n>Our approach provides an alternative to existing 3D-dependent goal-extraction approaches, achieving a 15% improvement in early-stage exploration efficiency.
arXiv Detail & Related papers (2025-01-08T16:25:32Z) - TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation [52.422619828854984]
We introduce TopV-Nav, an MLLM-based method that directly reasons on the top-view map with sufficient spatial information.<n>To fully unlock the MLLM's spatial reasoning potential in top-view perspective, we propose the Adaptive Visual Prompt Generation (AVPG) method.
arXiv Detail & Related papers (2024-11-25T14:27:55Z) - Correctable Landmark Discovery via Large Models for Vision-Language Navigation [89.15243018016211]
Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position.
Previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes.
We propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE)
arXiv Detail & Related papers (2024-05-29T03:05:59Z) - Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation [1.2473780585666772]
Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities.
We propose a Hierarchical Spatial Proximity Reasoning (HSPR) method to help the agent build a knowledge base of hierarchical spatial proximity.
We validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R.
arXiv Detail & Related papers (2024-03-18T07:51:22Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [97.88246428240872]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.<n>Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.<n>This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - Mind the Gap: Improving Success Rate of Vision-and-Language Navigation
by Revisiting Oracle Success Routes [25.944819618283613]
Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction.
We make the first attempt to tackle a long-ignored problem in VLN: narrowing the gap between Success Rate (SR) and Oracle Success Rate (OSR)
arXiv Detail & Related papers (2023-08-07T01:43:25Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration
for Zero-Shot Object Navigation [58.3480730643517]
We present LGX, a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON)
Our approach makes use of Large Language Models (LLMs) for this task.
We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline.
arXiv Detail & Related papers (2023-03-06T20:19:19Z) - ReVoLT: Relational Reasoning and Voronoi Local Graph Planning for
Target-driven Navigation [1.0896567381206714]
Embodied AI is an inevitable trend that emphasizes the interaction between intelligent entities and the real world.
Recent works focus on exploiting layout relationships by graph neural networks (GNNs)
We decouple this task and propose ReVoLT, a hierarchical framework.
arXiv Detail & Related papers (2023-01-06T05:19:56Z) - ULN: Towards Underspecified Vision-and-Language Navigation [77.81257404252132]
Underspecified vision-and-Language Navigation (ULN) is a new setting for vision-and-Language Navigation (VLN)
We propose a VLN framework that consists of a classification module, a navigation agent, and an Exploitation-to-Exploration (E2E) module.
Our framework is more robust and outperforms the baselines on ULN by 10% relative success rate across all levels.
arXiv Detail & Related papers (2022-10-18T17:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.