SOON: Scenario Oriented Object Navigation with Graph-based Exploration
        - URL: http://arxiv.org/abs/2103.17138v1
- Date: Wed, 31 Mar 2021 15:01:04 GMT
- Title: SOON: Scenario Oriented Object Navigation with Graph-based Exploration
- Authors: Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang
- Abstract summary: The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
- Score: 102.74649829684617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   The ability to navigate like a human towards a language-guided target from
anywhere in a 3D embodied environment is one of the 'holy grail' goals of
intelligent robots. Most visual navigation benchmarks, however, focus on
navigating toward a target from a fixed starting point, guided by an elaborate
set of instructions that depicts step-by-step. This approach deviates from
real-world problems in which human-only describes what the object and its
surrounding look like and asks the robot to start navigation from anywhere.
Accordingly, in this paper, we introduce a Scenario Oriented Object Navigation
(SOON) task. In this task, an agent is required to navigate from an arbitrary
position in a 3D embodied environment to localize a target following a scene
description. To give a promising direction to solve this task, we propose a
novel graph-based exploration (GBE) method, which models the navigation state
as a graph and introduces a novel graph-based exploration approach to learn
knowledge from the graph and stabilize training by learning sub-optimal
trajectories. We also propose a new large-scale benchmark named From Anywhere
to Object (FAO) dataset. To avoid target ambiguity, the descriptions in FAO
provide rich semantic scene information includes: object attribute, object
relationship, region description, and nearby region description. Our
experiments reveal that the proposed GBE outperforms various state-of-the-arts
on both FAO and R2R datasets. And the ablation studies on FAO validates the
quality of the dataset.
 
      
        Related papers
        - Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration   for Efficient and Versatile Embodied Navigation [54.04601077224252]
 Embodied scene understanding requires not only comprehending visual-spatial information but also determining where to explore next in the 3D physical world.<n>underlinetextbf3D vision-language learning enables embodied agents to effectively explore and understand their environment.<n>model's versatility enables navigation using diverse input modalities, including categories, language descriptions, and reference images.
 arXiv  Detail & Related papers  (2025-07-05T14:15:52Z)
- SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using   Vision-Foundation Models [10.671262416557704]
 Vision Foundation Models (VFMs) offer powerful capabilities for visual understanding and reasoning.<n>We present a zero-shot object goal navigation framework that integrates the perceptual strength of VFMs with a model-based planner.<n>We evaluate our approach on the HM3D dataset using the Habitat simulator and demonstrate that our method achieves state-of-the-art performance.
 arXiv  Detail & Related papers  (2025-06-04T03:04:54Z)
- GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
 Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
Our framework constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
 arXiv  Detail & Related papers  (2024-03-18T09:56:48Z)
- Right Place, Right Time! Dynamizing Topological Graphs for Embodied   Navigation [55.581423861790945]
 Embodied Navigation tasks often involve constructing topological graphs of a scene during exploration.
We introduce structured object transitions to dynamize static topological graphs called Object Transition Graphs (OTGs)
OTGs simulate portable targets following structured routes inspired by human habits.
 arXiv  Detail & Related papers  (2024-03-14T22:33:22Z)
- Aligning Knowledge Graph with Visual Perception for Object-goal   Navigation [16.32780793344835]
 We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation.
Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception.
The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
 arXiv  Detail & Related papers  (2024-02-29T06:31:18Z)
- VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language
  Model [28.79971953667143]
 VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time.
By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
 arXiv  Detail & Related papers  (2024-01-05T08:05:07Z)
- VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation [36.31724466541213]
 We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM)
VLFM is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments.
We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator.
 arXiv  Detail & Related papers  (2023-12-06T04:02:28Z)
- Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
 We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
 arXiv  Detail & Related papers  (2023-08-10T14:21:33Z)
- How To Not Train Your Dragon: Training-free Embodied Object Goal
  Navigation with Semantic Frontiers [94.46825166907831]
 We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
 arXiv  Detail & Related papers  (2023-05-26T13:38:33Z)
- Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
  Navigation [87.52136927091712]
 We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
 arXiv  Detail & Related papers  (2022-10-14T04:23:27Z)
- ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
 Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
 arXiv  Detail & Related papers  (2022-02-23T02:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.