Related papers: Towards real-world navigation with deep differentiable planners

Towards real-world navigation with deep differentiable planners

URL: http://arxiv.org/abs/2108.05713v1
Date: Sun, 8 Aug 2021 11:29:16 GMT
Title: Towards real-world navigation with deep differentiable planners
Authors: Shu Ishida, Jo\~ao F. Henriques
Abstract summary: We train embodied neural networks to plan and navigate unseen complex 3D environments. We focus on differentiable planners such as Value Iteration Networks (VIN), which are trained offline from safe expert demonstrations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We train embodied neural networks to plan and navigate unseen complex 3D environments, emphasising real-world deployment. Rather than requiring prior knowledge of the agent or environment, the planner learns to model the state transitions and rewards. To avoid the potentially hazardous trial-and-error of reinforcement learning, we focus on differentiable planners such as Value Iteration Networks (VIN), which are trained offline from safe expert demonstrations. Although they work well in small simulations, we address two major limitations that hinder their deployment. First, we observed that current differentiable planners struggle to plan long-term in environments with a high branching complexity. While they should ideally learn to assign low rewards to obstacles to avoid collisions, we posit that the constraints imposed on the network are not strong enough to guarantee the network to learn sufficiently large penalties for every possible collision. We thus impose a structural constraint on the value iteration, which explicitly learns to model any impossible actions. Secondly, we extend the model to work with a limited perspective camera under translation and rotation, which is crucial for real robot deployment. Many VIN-like planners assume a 360 degrees or overhead view without rotation. In contrast, our method uses a memory-efficient lattice map to aggregate CNN embeddings of partial observations, and models the rotational dynamics explicitly using a 3D state-space grid (translation and rotation). Our proposals significantly improve semantic navigation and exploration on several 2D and 3D environments, succeeding in settings that are otherwise challenging for this class of methods. As far as we know, we are the first to successfully perform differentiable planning on the difficult Active Vision Dataset, consisting of real images captured from a robot.

Related papers

NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning [67.53972459080437]
Navigating a nonholonomic robot in a cluttered, unknown environment requires accurate perception and precise motion for real-time collision avoidance. This paper presents NeuPAN: a real-time, highly accurate, map-free, easy-to-deploy, and environment-invariant robot motion planner.
arXiv Detail & Related papers (2024-03-11T15:44:38Z)
Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning [22.48658555542736]
Key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations. We propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments.
arXiv Detail & Related papers (2024-02-07T14:24:41Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning [55.517000360348725]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited. To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy. Experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning.
arXiv Detail & Related papers (2023-12-01T15:47:04Z)
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning [15.346150968195015]
We introduce SayPlan, a scalable approach to large-scale task planning for robotics using 3D scene graph (3DSG) representations. We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects.
arXiv Detail & Related papers (2023-07-12T12:37:55Z)
CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios. Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z)
Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation [1.2783783498844021]
A typical SOTA system is composed of four main modules -- mapper, global planner, local planner, and command-tracking controller. We build a robust and safe local planner which is designed to generate a velocity plan to track a coarsely planned path from the global planner. Using our framework, a quadruped robot can autonomously navigate in various complex environments without a collision and generate a smoother command plan compared to the baseline method.
arXiv Detail & Related papers (2022-04-19T04:01:44Z)
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z)
End-to-End Partially Observable Visual Navigation in a Diverse Environment [30.895264166384685]
This work aims at three challenges: (i) complex visual observations, (ii) partial observability of local sensing, and (iii) multimodal navigation behaviors. We propose a novel neural network (NN) architecture to represent a local controller and leverage the flexibility of the end-to-end approach to learn a powerful policy. We implement the NN controller on the SPOT robot and evaluate it on three challenging tasks with partial observations.
arXiv Detail & Related papers (2021-09-16T06:53:57Z)
Learning Synthetic to Real Transfer for Localization and Navigational Tasks [7.019683407682642]
Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control. This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible. To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
arXiv Detail & Related papers (2020-11-20T08:37:03Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes. Our proposed method combines visual features and 3D spatial representations to learn navigation policy. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.