Is Mapping Necessary for Realistic PointGoal Navigation?
- URL: http://arxiv.org/abs/2206.00997v1
- Date: Thu, 2 Jun 2022 11:37:27 GMT
- Title: Is Mapping Necessary for Realistic PointGoal Navigation?
- Authors: Ruslan Partsey, Erik Wijmans, Naoki Yokoyama, Oles Dobosevych, Dhruv
Batra, Oleksandr Maksymets
- Abstract summary: We show that map-less neural models can achieve 100% Success on a standard dataset.
We then identify the main cause of this drop in performance: the absence of GPS+.
We develop human-annotation-free data-augmentation techniques to train models for visual odometry.
- Score: 44.54452415882708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can an autonomous agent navigate in a new environment without building an
explicit map?
For the task of PointGoal navigation ('Go to $\Delta x$, $\Delta y$') under
idealized settings (no RGB-D and actuation noise, perfect GPS+Compass), the
answer is a clear 'yes' - map-less neural models composed of task-agnostic
components (CNNs and RNNs) trained with large-scale reinforcement learning
achieve 100% Success on a standard dataset (Gibson). However, for PointNav in a
realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open
question; one we tackle in this paper. The strongest published result for this
task is 71.7% Success.
First, we identify the main (perhaps, only) cause of the drop in performance:
the absence of GPS+Compass. An agent with perfect GPS+Compass faced with RGB-D
sensing and actuation noise achieves 99.8% Success (Gibson-v2 val). This
suggests that (to paraphrase a meme) robust visual odometry is all we need for
realistic PointNav; if we can achieve that, we can ignore the sensing and
actuation noise.
With that as our operating hypothesis, we scale the dataset and model size,
and develop human-annotation-free data-augmentation techniques to train models
for visual odometry. We advance the state of art on the Habitat Realistic
PointNav Challenge from 71% to 94% Success (+32, 4% relative) and 53% to 74%
SPL (+39, 6% relative). While our approach does not saturate or 'solve' this
dataset, this strong improvement combined with promising zero-shot sim2real
transfer (to a LoCoBot) provides evidence consistent with the hypothesis that
explicit mapping may not be necessary for navigation, even in a realistic
setting.
Related papers
- GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
Our framework constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied
Scenarios [66.05091704671503]
We present a novel angle navigation paradigm to deal with flight deviation in point-to-point navigation tasks.
We also propose a model that includes the Adaptive Feature Enhance Module, Cross-knowledge Attention-guided Module and Robust Task-oriented Head Module.
arXiv Detail & Related papers (2024-02-04T08:41:20Z) - VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language
Model [28.79971953667143]
VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time.
By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
arXiv Detail & Related papers (2024-01-05T08:05:07Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - One-4-All: Neural Potential Fields for Embodied Navigation [10.452316044889177]
Real-world navigation can require long-horizon planning using high-dimensional RGB images.
One-4-All (O4A) is a method leveraging self-supervised and manifold learning to obtain a graph-free, end-to-end navigation pipeline.
We show that O4A can reach long-range goals in 8 simulated Gibson indoor environments.
arXiv Detail & Related papers (2023-03-07T16:25:41Z) - Unsupervised Visual Odometry and Action Integration for PointGoal
Navigation in Indoor Environment [14.363948775085534]
PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point.
To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner.
Experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.
arXiv Detail & Related papers (2022-10-02T03:12:03Z) - The Surprising Effectiveness of Visual Odometry Techniques for Embodied
PointGoal Navigation [100.08270721713149]
PointGoal navigation has been introduced in simulated Embodied AI environments.
Recent advances solve this PointGoal navigation task with near-perfect accuracy (99.6% success)
We show that integrating visual odometry techniques into navigation policies improves the state-of-the-art on the popular Habitat PointNav benchmark by a large margin.
arXiv Detail & Related papers (2021-08-26T02:12:49Z) - Sim-to-Real Transfer for Vision-and-Language Navigation [70.86250473583354]
We study the problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.
Recent work on the task of Vision-and-Language Navigation (VLN) has achieved significant progress in simulation.
To assess the implications of this work for robotics, we transfer a VLN agent trained in simulation to a physical robot.
arXiv Detail & Related papers (2020-11-07T16:49:04Z) - MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale
Robotic Navigation [23.54696982881734]
We propose a novel motion and visual perception approach, dubbed MVP, for large-scale, target-driven navigation tasks.
Our MVP-based method can learn faster, and is more accurate and robust to both extreme environmental changes and poor GPS data.
We evaluate our method on two large real-world datasets, Oxford Robotcar and Nordland Railway.
arXiv Detail & Related papers (2020-03-02T05:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.