End-to-End Partially Observable Visual Navigation in a Diverse
Environment
- URL: http://arxiv.org/abs/2109.07752v1
- Date: Thu, 16 Sep 2021 06:53:57 GMT
- Title: End-to-End Partially Observable Visual Navigation in a Diverse
Environment
- Authors: Bo Ai, Wei Gao, Vinay, David Hsu
- Abstract summary: This work aims at three challenges: (i) complex visual observations, (ii) partial observability of local sensing, and (iii) multimodal navigation behaviors.
We propose a novel neural network (NN) architecture to represent a local controller and leverage the flexibility of the end-to-end approach to learn a powerful policy.
We implement the NN controller on the SPOT robot and evaluate it on three challenging tasks with partial observations.
- Score: 30.895264166384685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can a robot navigate successfully in a rich and diverse environment,
indoors or outdoors, along an office corridor or a trail in the park, on the
flat ground, the staircase, or the elevator, etc.? To this end, this work aims
at three challenges: (i) complex visual observations, (ii) partial
observability of local sensing, and (iii) multimodal navigation behaviors that
depend on both the local environment and the high-level goal. We propose a
novel neural network (NN) architecture to represent a local controller and
leverage the flexibility of the end-to-end approach to learn a powerful policy.
To tackle complex visual observations, we extract multiscale spatial
information through convolution layers. To deal with partial observability, we
encode rich history information in LSTM-like modules. Importantly, we integrate
the two into a single unified architecture that exploits convolutional memory
cells to track the observation history at multiple spatial scales, which can
capture the complex spatiotemporal dependencies between observations and
controls. We additionally condition the network on the high-level goal in order
to generate different navigation behavior modes. Specifically, we propose to
use independent memory cells for different modes to prevent mode collapse in
the learned policy. We implemented the NN controller on the SPOT robot and
evaluate it on three challenging tasks with partial observations: adversarial
pedestrian avoidance, blind-spot obstacle avoidance, and elevator riding. Our
model significantly outperforms CNNs, conventional LSTMs, or the ablated
versions of our model. A demo video will be publicly available, showing our
SPOT robot traversing many different locations on our university campus.
Related papers
- Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning [58.69297999175239]
In robot learning, the observation space is crucial due to the distinct characteristics of different modalities.
In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud.
arXiv Detail & Related papers (2024-02-04T14:18:45Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - Navigating to Objects in the Real World [76.1517654037993]
We present a large-scale empirical study of semantic visual navigation methods comparing methods from classical, modular, and end-to-end learning approaches.
We find that modular learning works well in the real world, attaining a 90% success rate.
In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
arXiv Detail & Related papers (2022-12-02T01:10:47Z) - Polyline Based Generative Navigable Space Segmentation for Autonomous
Visual Navigation [57.3062528453841]
We propose a representation-learning-based framework to enable robots to learn the navigable space segmentation in an unsupervised manner.
We show that the proposed PSV-Nets can learn the visual navigable space with high accuracy, even without any single label.
arXiv Detail & Related papers (2021-10-29T19:50:48Z) - Towards real-world navigation with deep differentiable planners [0.0]
We train embodied neural networks to plan and navigate unseen complex 3D environments.
We focus on differentiable planners such as Value Iteration Networks (VIN), which are trained offline from safe expert demonstrations.
arXiv Detail & Related papers (2021-08-08T11:29:16Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z) - Learning Synthetic to Real Transfer for Localization and Navigational
Tasks [7.019683407682642]
Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control.
This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible.
To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
arXiv Detail & Related papers (2020-11-20T08:37:03Z) - Learning to Set Waypoints for Audio-Visual Navigation [89.42192208471735]
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source.
Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations.
We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements.
arXiv Detail & Related papers (2020-08-21T18:00:33Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.