Towards real-world navigation with deep differentiable planners
- URL: http://arxiv.org/abs/2108.05713v1
- Date: Sun, 8 Aug 2021 11:29:16 GMT
- Title: Towards real-world navigation with deep differentiable planners
- Authors: Shu Ishida, Jo\~ao F. Henriques
- Abstract summary: We train embodied neural networks to plan and navigate unseen complex 3D environments.
We focus on differentiable planners such as Value Iteration Networks (VIN), which are trained offline from safe expert demonstrations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We train embodied neural networks to plan and navigate unseen complex 3D
environments, emphasising real-world deployment. Rather than requiring prior
knowledge of the agent or environment, the planner learns to model the state
transitions and rewards. To avoid the potentially hazardous trial-and-error of
reinforcement learning, we focus on differentiable planners such as Value
Iteration Networks (VIN), which are trained offline from safe expert
demonstrations. Although they work well in small simulations, we address two
major limitations that hinder their deployment. First, we observed that current
differentiable planners struggle to plan long-term in environments with a high
branching complexity. While they should ideally learn to assign low rewards to
obstacles to avoid collisions, we posit that the constraints imposed on the
network are not strong enough to guarantee the network to learn sufficiently
large penalties for every possible collision. We thus impose a structural
constraint on the value iteration, which explicitly learns to model any
impossible actions. Secondly, we extend the model to work with a limited
perspective camera under translation and rotation, which is crucial for real
robot deployment. Many VIN-like planners assume a 360 degrees or overhead view
without rotation. In contrast, our method uses a memory-efficient lattice map
to aggregate CNN embeddings of partial observations, and models the rotational
dynamics explicitly using a 3D state-space grid (translation and rotation). Our
proposals significantly improve semantic navigation and exploration on several
2D and 3D environments, succeeding in settings that are otherwise challenging
for this class of methods. As far as we know, we are the first to successfully
perform differentiable planning on the difficult Active Vision Dataset,
consisting of real images captured from a robot.
Related papers
- Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning [22.48658555542736]
Key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations.
We propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments.
arXiv Detail & Related papers (2024-02-07T14:24:41Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature
Aligned Pre-Training and Region-Aware Fine-tuning [55.517000360348725]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited.
To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy.
Experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning.
arXiv Detail & Related papers (2023-12-01T15:47:04Z) - SayPlan: Grounding Large Language Models using 3D Scene Graphs for
Scalable Robot Task Planning [15.346150968195015]
We introduce SayPlan, a scalable approach to large-scale task planning for robotics using 3D scene graph (3DSG) representations.
We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects.
arXiv Detail & Related papers (2023-07-12T12:37:55Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe
Quadruped Navigation [1.2783783498844021]
A typical SOTA system is composed of four main modules -- mapper, global planner, local planner, and command-tracking controller.
We build a robust and safe local planner which is designed to generate a velocity plan to track a coarsely planned path from the global planner.
Using our framework, a quadruped robot can autonomously navigate in various complex environments without a collision and generate a smoother command plan compared to the baseline method.
arXiv Detail & Related papers (2022-04-19T04:01:44Z) - SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for
Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU.
Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module.
To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z) - End-to-End Partially Observable Visual Navigation in a Diverse
Environment [30.895264166384685]
This work aims at three challenges: (i) complex visual observations, (ii) partial observability of local sensing, and (iii) multimodal navigation behaviors.
We propose a novel neural network (NN) architecture to represent a local controller and leverage the flexibility of the end-to-end approach to learn a powerful policy.
We implement the NN controller on the SPOT robot and evaluate it on three challenging tasks with partial observations.
arXiv Detail & Related papers (2021-09-16T06:53:57Z) - Learning Synthetic to Real Transfer for Localization and Navigational
Tasks [7.019683407682642]
Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control.
This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible.
To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
arXiv Detail & Related papers (2020-11-20T08:37:03Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.