Learning Synthetic to Real Transfer for Localization and Navigational
Tasks
- URL: http://arxiv.org/abs/2011.10274v2
- Date: Mon, 23 Nov 2020 16:43:22 GMT
- Title: Learning Synthetic to Real Transfer for Localization and Navigational
Tasks
- Authors: Maxime Pietrantoni, Boris Chidlovskii, Tomi Silander
- Abstract summary: Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control.
This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible.
To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
- Score: 7.019683407682642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous navigation consists in an agent being able to navigate without
human intervention or supervision, it affects both high level planning and low
level control. Navigation is at the crossroad of multiple disciplines, it
combines notions of computer vision, robotics and control. This work aimed at
creating, in a simulation, a navigation pipeline whose transfer to the real
world could be done with as few efforts as possible. Given the limited time and
the wide range of problematic to be tackled, absolute navigation performances
while important was not the main objective. The emphasis was rather put on
studying the sim2real gap which is one the major bottlenecks of modern robotics
and autonomous navigation. To design the navigation pipeline four main
challenges arise; environment, localization, navigation and planning. The
iGibson simulator is picked for its photo-realistic textures and physics
engine. A topological approach to tackle space representation was picked over
metric approaches because they generalize better to new environments and are
less sensitive to change of conditions. The navigation pipeline is decomposed
as a localization module, a planning module and a local navigation module.
These modules utilize three different networks, an image representation
extractor, a passage detector and a local policy. The laters are trained on
specifically tailored tasks with some associated datasets created for those
specific tasks. Localization is the ability for the agent to localize itself
against a specific space representation. It must be reliable, repeatable and
robust to a wide variety of transformations. Localization is tackled as an
image retrieval task using a deep neural network trained on an auxiliary task
as a feature descriptor extractor. The local policy is trained with behavioral
cloning from expert trajectories gathered with ROS navigation stack.
Related papers
- Vision and Language Navigation in the Real World via Online Visual
Language Mapping [18.769171505280127]
Vision-and-language navigation (VLN) methods are mainly evaluated in simulation.
We propose a novel framework to address the VLN task in the real world.
We evaluate the proposed pipeline on an Interbotix LoCoBot WX250 in an unseen lab environment.
arXiv Detail & Related papers (2023-10-16T20:44:09Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Image-based Navigation in Real-World Environments via Multiple Mid-level
Representations: Fusion Models, Benchmark and Efficient Evaluation [13.207579081178716]
In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously.
Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world.
One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene.
arXiv Detail & Related papers (2022-02-02T15:00:44Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z) - MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation [23.877609358505268]
Recent work shows that map-like memory is useful for long-horizon navigation tasks.
We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment.
We examine how a variety of agent models perform across a spectrum of navigation task complexities.
arXiv Detail & Related papers (2020-12-07T18:42:38Z) - Unsupervised Domain Adaptation for Visual Navigation [115.85181329193092]
We propose an unsupervised domain adaptation method for visual navigation.
Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy.
arXiv Detail & Related papers (2020-10-27T18:22:43Z) - Embodied Visual Navigation with Automatic Curriculum Learning in Real
Environments [20.017277077448924]
NavACL is a method of automatic curriculum learning tailored to the navigation task.
Deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling.
Our agents can navigate through unknown cluttered indoor environments to semantically-specified targets using only RGB images.
arXiv Detail & Related papers (2020-09-11T13:28:26Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z) - Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent.
Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry.
We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.