FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
- URL: http://arxiv.org/abs/2310.07473v1
- Date: Wed, 11 Oct 2023 13:19:29 GMT
- Title: FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
- Authors: Xinyu Sun, Peihao Chen, Jugang Fan, Thomas H. Li, Jian Chen, Mingkui
Tan
- Abstract summary: We propose a Fine-grained Goal Prompting (FGPrompt) method for image-goal navigation.
FGPrompt preserves detailed information in the goal image and guides the observation encoder to pay attention to goal-relevant regions.
Our method brings significant performance improvement on 3 benchmark datasets.
- Score: 54.25416624924669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning to navigate to an image-specified goal is an important but
challenging task for autonomous systems. The agent is required to reason the
goal location from where a picture is shot. Existing methods try to solve this
problem by learning a navigation policy, which captures semantic features of
the goal image and observation image independently and lastly fuses them for
predicting a sequence of navigation actions. However, these methods suffer from
two major limitations. 1) They may miss detailed information in the goal image,
and thus fail to reason the goal location. 2) More critically, it is hard to
focus on the goal-relevant regions in the observation image, because they
attempt to understand observation without goal conditioning. In this paper, we
aim to overcome these limitations by designing a Fine-grained Goal Prompting
(FGPrompt) method for image-goal navigation. In particular, we leverage
fine-grained and high-resolution feature maps in the goal image as prompts to
perform conditioned embedding, which preserves detailed information in the goal
image and guides the observation encoder to pay attention to goal-relevant
regions. Compared with existing methods on the image-goal navigation benchmark,
our method brings significant performance improvement on 3 benchmark datasets
(i.e., Gibson, MP3D, and HM3D). Especially on Gibson, we surpass the
state-of-the-art success rate by 8% with only 1/50 model size. Project page:
https://xinyusun.github.io/fgprompt-pages
Related papers
- Transformers for Image-Goal Navigation [0.0]
We present a generative Transformer based model that jointly models image goals, camera observations and the robot's past actions to predict future actions.
Our model demonstrates capability in capturing and associating visual information across long time horizons, helping in effective navigation.
arXiv Detail & Related papers (2024-05-23T03:01:32Z) - GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation [65.71524410114797]
GOAT-Bench is a benchmark for the universal navigation task GO to AnyThing (GOAT)
In GOAT, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image.
We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities.
arXiv Detail & Related papers (2024-04-09T20:40:00Z) - GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
Our framework constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Navigating to Objects Specified by Images [86.9672766351891]
We present a system that can perform the task in both simulation and the real world.
Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.
On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x.
arXiv Detail & Related papers (2023-04-03T17:58:00Z) - Last-Mile Embodied Visual Navigation [31.622495628224403]
We propose SLING to improve the performance of image-goal navigation systems.
We focus on last-mile navigation and leverage the underlying geometric structure of the problem with neural descriptors.
On a standardized image-goal navigation benchmark, we improve performance across policies, scenes, and episode complexity, raising the state-of-the-art from 45% to 55% success rate.
arXiv Detail & Related papers (2022-11-21T18:59:58Z) - SGoLAM: Simultaneous Goal Localization and Mapping for Multi-Object Goal
Navigation [5.447924312563365]
We present SGoLAM, a simple and efficient algorithm for Multi-Object Goal navigation.
Given an agent equipped with an RGB-D camera and a GPS/ sensor, our objective is to have the agent navigate to a sequence of target objects in realistic 3D environments.
SGoLAM is ranked 2nd in the CVPR 2021 MultiON (Multi-Object Goal Navigation) challenge.
arXiv Detail & Related papers (2021-10-14T06:15:14Z) - Memory-Augmented Reinforcement Learning for Image-Goal Navigation [67.3963444878746]
We present a novel method that leverages a cross-episode memory to learn to navigate.
In order to avoid overfitting, we propose to use data augmentation on the RGB input during training.
We obtain this competitive performance from RGB input only, without access to additional sensors such as position or depth.
arXiv Detail & Related papers (2021-01-13T16:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.