Diffusion as Reasoning: Enhancing Object Goal Navigation with LLM-Biased Diffusion Model
- URL: http://arxiv.org/abs/2410.21842v1
- Date: Tue, 29 Oct 2024 08:10:06 GMT
- Title: Diffusion as Reasoning: Enhancing Object Goal Navigation with LLM-Biased Diffusion Model
- Authors: Yiming Ji, Yang Liu, Zhengpu Wang, Boyu Ma, Zongwu Xie, Hong Liu,
- Abstract summary: We propose a new approach to solving the ObjectNav task, by training a diffusion model to learn the statistical distribution patterns of objects in semantic maps.
We also propose the global target bias and local LLM bias methods, where the former can constrain the diffusion model to generate the target object more effectively.
Based on the generated map in the unknown region, the agent sets the predicted location of the target as the goal and moves towards it.
- Score: 9.939998139837426
- License:
- Abstract: The Object Goal Navigation (ObjectNav) task requires the agent to navigate to a specified target in an unseen environment. Since the environment layout is unknown, the agent needs to perform semantic reasoning to infer the potential location of the target, based on its accumulated memory of the environment during the navigation process. Diffusion models have been shown to be able to learn the distribution relationships between features in RGB images, and thus generate new realistic images.In this work, we propose a new approach to solving the ObjectNav task, by training a diffusion model to learn the statistical distribution patterns of objects in semantic maps, and using the map of the explored regions during navigation as the condition to generate the map of the unknown regions, thereby realizing the semantic reasoning of the target object, i.e., diffusion as reasoning (DAR). Meanwhile, we propose the global target bias and local LLM bias methods, where the former can constrain the diffusion model to generate the target object more effectively, and the latter utilizes the common sense knowledge extracted from the LLM to improve the generalization of the reasoning process. Based on the generated map in the unknown region, the agent sets the predicted location of the target as the goal and moves towards it. Experiments on Gibson and MP3D show the effectiveness of our method.
Related papers
- GOMAA-Geo: GOal Modality Agnostic Active Geo-localization [49.599465495973654]
We consider the task of active geo-localization (AGL) in which an agent uses a sequence of visual cues observed during aerial navigation to find a target specified through multiple possible modalities.
GOMAA-Geo is a goal modality active geo-localization agent for zero-shot generalization between different goal modalities.
arXiv Detail & Related papers (2024-06-04T02:59:36Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - PEANUT: Predicting and Navigating to Unseen Targets [18.87376347895365]
Efficient ObjectGoal navigation (ObjectNav) in novel environments requires an understanding of the spatial and semantic regularities in environment layouts.
We present a method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps.
Our prediction model is lightweight and can be trained in a supervised manner using a relatively small amount of passively collected data.
arXiv Detail & Related papers (2022-12-05T18:58:58Z) - Navigating to Objects in Unseen Environments by Distance Prediction [16.023495311387478]
We propose an object goal navigation framework, which could directly perform path planning based on an estimated distance map.
Specifically, our model takes a birds-eye-view semantic map as input, and estimates the distance from the map cells to the target object.
With the estimated distance map, the agent could explore the environment and navigate to the target objects based on either human-designed or learned navigation policy.
arXiv Detail & Related papers (2022-02-08T09:22:50Z) - SGoLAM: Simultaneous Goal Localization and Mapping for Multi-Object Goal
Navigation [5.447924312563365]
We present SGoLAM, a simple and efficient algorithm for Multi-Object Goal navigation.
Given an agent equipped with an RGB-D camera and a GPS/ sensor, our objective is to have the agent navigate to a sequence of target objects in realistic 3D environments.
SGoLAM is ranked 2nd in the CVPR 2021 MultiON (Multi-Object Goal Navigation) challenge.
arXiv Detail & Related papers (2021-10-14T06:15:14Z) - Learning to Map for Active Semantic Goal Navigation [40.193928212509356]
We propose a novel framework that actively learns to generate semantic maps outside the field of view of the agent.
We show how different objectives can be defined by balancing exploration with exploitation.
Our method is validated in the visually realistic environments offered by the Matterport3D dataset.
arXiv Detail & Related papers (2021-06-29T18:01:30Z) - Object Goal Navigation using Goal-Oriented Semantic Exploration [98.14078233526476]
This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments.
We propose a modular system called, Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently.
arXiv Detail & Related papers (2020-07-01T17:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.