Neural Scene Representation for Locomotion on Structured Terrain
- URL: http://arxiv.org/abs/2206.08077v1
- Date: Thu, 16 Jun 2022 10:45:17 GMT
- Title: Neural Scene Representation for Locomotion on Structured Terrain
- Authors: David Hoeller, Nikita Rudin, Christopher Choy, Animashree Anandkumar,
Marco Hutter
- Abstract summary: We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
- Score: 56.48607865960868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a learning-based method to reconstruct the local terrain for
locomotion with a mobile robot traversing urban environments. Using a stream of
depth measurements from the onboard cameras and the robot's trajectory, the
algorithm estimates the topography in the robot's vicinity. The raw
measurements from these cameras are noisy and only provide partial and occluded
observations that in many cases do not show the terrain the robot stands on.
Therefore, we propose a 3D reconstruction model that faithfully reconstructs
the scene, despite the noisy measurements and large amounts of missing data
coming from the blind spots of the camera arrangement. The model consists of a
4D fully convolutional network on point clouds that learns the geometric priors
to complete the scene from the context and an auto-regressive feedback to
leverage spatio-temporal consistency and use evidence from the past. The
network can be solely trained with synthetic data, and due to extensive
augmentation, it is robust in the real world, as shown in the validation on a
quadrupedal robot, ANYmal, traversing challenging settings. We run the pipeline
on the robot's onboard low-power computer using an efficient sparse tensor
implementation and show that the proposed method outperforms classical map
representations.
Related papers
- Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction [51.49400490437258]
This work develops a method for imitating articulated object manipulation from a single monocular RGB human demonstration.
We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video.
Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion.
We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot.
arXiv Detail & Related papers (2024-09-26T17:57:16Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks [3.031375888004876]
Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality.
We propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and poses for a given scene.
Our model achieves similar performance to the SOTA while being faster to train and only requiring offline rendering of low-resolution synthetic data.
arXiv Detail & Related papers (2024-04-20T06:25:32Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - NSLF-OL: Online Learning of Neural Surface Light Fields alongside
Real-time Incremental 3D Reconstruction [0.76146285961466]
The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions.
Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input.
In addition to online training, our model also provides real-time rendering after completing the data stream for visualization.
arXiv Detail & Related papers (2023-04-29T15:41:15Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Markerless Camera-to-Robot Pose Estimation via Self-supervised
Sim-to-Real Transfer [26.21320177775571]
We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method.
Our framework combines deep learning and geometric vision for solving the robot pose, and the pipeline is fully differentiable.
arXiv Detail & Related papers (2023-02-28T05:55:42Z) - Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic
Images in Facial Capture Pipelines [8.366597450893456]
We propose an end-to-end pipeline for building and tracking 3D facial models from personalized in-the-wild video data.
We present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision algorithms in traditional computer graphics pipelines.
We outline how we train a motion capture regressor, leveraging the aforementioned techniques to avoid the need for real-world ground truth data.
arXiv Detail & Related papers (2022-04-22T15:09:49Z) - Solving Occlusion in Terrain Mapping with Neural Networks [7.703348666813963]
We introduce a self-supervised learning approach capable of training on real-world data without a need for ground-truth information.
Our neural network is able to run in real-time on both CPU and GPU with suitable sampling rates for autonomous ground robots.
arXiv Detail & Related papers (2021-09-15T08:30:16Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.