Related papers: Neural Volumetric Memory for Visual Locomotion Control

Neural Volumetric Memory for Visual Locomotion Control

URL: http://arxiv.org/abs/2304.01201v1
Date: Mon, 3 Apr 2023 17:59:56 GMT
Title: Neural Volumetric Memory for Visual Locomotion Control
Authors: Ruihan Yang, Ge Yang, Xiaolong Wang
Abstract summary: In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene. We show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na"ive methods.
Score: 11.871849736648237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na\"ive methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is https://rchalyang.github.io/NVM .

Related papers

VGGT: Visual Geometry Grounded Transformer [61.37669770946458]
VGGT is a feed-forward neural network that directly infers all key 3D attributes of a scene. Network achieves state-of-the-art results in multiple 3D tasks.
arXiv Detail & Related papers (2025-03-14T17:59:47Z)
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics [26.42651735582044]
We introduce RoboSpatial, a large-scale dataset for spatial understanding in robotics. It consists of real indoor and tabletop scenes, captured as 3D scans and egocentric images, and annotated with rich spatial information relevant to robotics. Our experiments show that models trained with RoboSpatial outperform baselines on downstream tasks such as spatial affordance prediction, spatial relationship prediction, and robot manipulation.
arXiv Detail & Related papers (2024-11-25T16:21:34Z)
OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation [30.76201018651464]
Traditional 3D scene understanding approaches rely on expensive labeled 3D datasets to train a model for a single task with supervision. We propose OpenOcc, a novel framework unifying the 3D scene reconstruction and open vocabulary understanding with neural radiance fields. We show that our approach achieves competitive performance in 3D scene understanding tasks, especially for small and long-tail objects.
arXiv Detail & Related papers (2024-03-18T13:53:48Z)
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences. We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene. We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z)
BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives [6.431806897364565]
Implicit neural representations have become pivotal in robotic perception, enabling robots to comprehend 3D environments from 2D images. We propose a framework called bundle-adjusting accelerated neural graphics primitives (BAA-NGP) Results demonstrate 10 to 20 x speed improvement compared to other bundle-adjusting neural radiance field methods.
arXiv Detail & Related papers (2023-06-07T05:36:45Z)
Visibility Aware Human-Object Interaction Tracking from Single RGB Camera [40.817960406002506]
We propose a novel method to track the 3D human, object, contacts between them, and their relative translation across frames from a single RGB camera. We condition our neural field reconstructions for human and object on per-frame SMPL model estimates obtained by pre-fitting SMPL to a video sequence. Human and object motion from visible frames provides valuable information to infer the occluded object.
arXiv Detail & Related papers (2023-03-29T06:23:44Z)
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence. Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z)
One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics. Each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z)
Neural Groundplans: Persistent Neural Scene Representations from a Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation. We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z)
Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity. We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.