Neural Scene Representation for Locomotion on Structured Terrain
- URL: http://arxiv.org/abs/2206.08077v1
- Date: Thu, 16 Jun 2022 10:45:17 GMT
- Title: Neural Scene Representation for Locomotion on Structured Terrain
- Authors: David Hoeller, Nikita Rudin, Christopher Choy, Animashree Anandkumar,
Marco Hutter
- Abstract summary: We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
- Score: 56.48607865960868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a learning-based method to reconstruct the local terrain for
locomotion with a mobile robot traversing urban environments. Using a stream of
depth measurements from the onboard cameras and the robot's trajectory, the
algorithm estimates the topography in the robot's vicinity. The raw
measurements from these cameras are noisy and only provide partial and occluded
observations that in many cases do not show the terrain the robot stands on.
Therefore, we propose a 3D reconstruction model that faithfully reconstructs
the scene, despite the noisy measurements and large amounts of missing data
coming from the blind spots of the camera arrangement. The model consists of a
4D fully convolutional network on point clouds that learns the geometric priors
to complete the scene from the context and an auto-regressive feedback to
leverage spatio-temporal consistency and use evidence from the past. The
network can be solely trained with synthetic data, and due to extensive
augmentation, it is robust in the real world, as shown in the validation on a
quadrupedal robot, ANYmal, traversing challenging settings. We run the pipeline
on the robot's onboard low-power computer using an efficient sparse tensor
implementation and show that the proposed method outperforms classical map
representations.
Related papers
- Pre-training Auto-regressive Robotic Models with 4D Representations [43.80798244473759]
ARM4R is an Auto-regressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better pre-trained robotic model.
Our experiments show that ARM4R can transfer efficiently from human video data to robotics and consistently improves performance on tasks across various robot environments and configurations.
arXiv Detail & Related papers (2025-02-18T18:59:01Z) - Watch Your STEPP: Semantic Traversability Estimation using Pose Projected Features [4.392942391043664]
We propose a method for estimating terrain traversability by learning from demonstrations of human walking.
Our approach leverages dense, pixel-wise feature embeddings generated using the DINOv2 vision Transformer model.
By minimizing loss, the network distinguishes between familiar terrain with a low reconstruction error and unfamiliar or hazardous terrain with a higher reconstruction error.
arXiv Detail & Related papers (2025-01-29T11:53:58Z) - Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos [76.07894127235058]
We present a system for mining high-quality 4D reconstructions from internet stereoscopic, wide-angle videos.
We use this method to generate large-scale data in the form of world-consistent, pseudo-metric 3D point clouds.
We demonstrate the utility of this data by training a variant of DUSt3R to predict structure and 3D motion from real-world image pairs.
arXiv Detail & Related papers (2024-12-12T18:59:54Z) - Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction [51.49400490437258]
This work develops a method for imitating articulated object manipulation from a single monocular RGB human demonstration.
We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video.
Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion.
We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot.
arXiv Detail & Related papers (2024-09-26T17:57:16Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Markerless Camera-to-Robot Pose Estimation via Self-supervised
Sim-to-Real Transfer [26.21320177775571]
We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method.
Our framework combines deep learning and geometric vision for solving the robot pose, and the pipeline is fully differentiable.
arXiv Detail & Related papers (2023-02-28T05:55:42Z) - Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic
Images in Facial Capture Pipelines [8.366597450893456]
We propose an end-to-end pipeline for building and tracking 3D facial models from personalized in-the-wild video data.
We present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision algorithms in traditional computer graphics pipelines.
We outline how we train a motion capture regressor, leveraging the aforementioned techniques to avoid the need for real-world ground truth data.
arXiv Detail & Related papers (2022-04-22T15:09:49Z) - Solving Occlusion in Terrain Mapping with Neural Networks [7.703348666813963]
We introduce a self-supervised learning approach capable of training on real-world data without a need for ground-truth information.
Our neural network is able to run in real-time on both CPU and GPU with suitable sampling rates for autonomous ground robots.
arXiv Detail & Related papers (2021-09-15T08:30:16Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.