Occupancy Anticipation for Efficient Exploration and Navigation
- URL: http://arxiv.org/abs/2008.09285v2
- Date: Tue, 25 Aug 2020 16:36:11 GMT
- Title: Occupancy Anticipation for Efficient Exploration and Navigation
- Authors: Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman
- Abstract summary: We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
- Score: 97.17517060585875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art navigation methods leverage a spatial memory to generalize
to new environments, but their occupancy maps are limited to capturing the
geometric structures directly observed by the agent. We propose occupancy
anticipation, where the agent uses its egocentric RGB-D observations to infer
the occupancy state beyond the visible regions. In doing so, the agent builds
its spatial awareness more rapidly, which facilitates efficient exploration and
navigation in 3D environments. By exploiting context in both the egocentric
views and top-down maps our model successfully anticipates a broader map of the
environment, with performance significantly better than strong baselines.
Furthermore, when deployed for the sequential decision-making tasks of
exploration and navigation, our model outperforms state-of-the-art methods on
the Gibson and Matterport3D datasets. Our approach is the winning entry in the
2020 Habitat PointNav Challenge. Project page:
http://vision.cs.utexas.edu/projects/occupancy_anticipation/
Related papers
- CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information [25.51740922661166]
Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues.
We introduce CityNav, a novel dataset explicitly designed for language-guided aerial navigation in 3D environments of real cities.
CityNav comprises 32k natural language descriptions paired with human demonstration trajectories, collected via a newly developed web-based 3D simulator.
arXiv Detail & Related papers (2024-06-20T12:08:27Z) - Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation [10.898724668444125]
We present a learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time.
We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain.
arXiv Detail & Related papers (2024-01-30T22:37:24Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Uncertainty-driven Planner for Exploration and Navigation [36.933903274373336]
We consider the problems of exploration and point-goal navigation in previously unseen environments.
We argue that learning occupancy priors over indoor maps provides significant advantages towards addressing these problems.
We present a novel planning framework that first learns to generate occupancy maps beyond the field-of-view of the agent.
arXiv Detail & Related papers (2022-02-24T05:25:31Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z) - Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent.
Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry.
We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.