Footprints and Free Space from a Single Color Image
- URL: http://arxiv.org/abs/2004.06376v1
- Date: Tue, 14 Apr 2020 09:29:17 GMT
- Title: Footprints and Free Space from a Single Color Image
- Authors: Jamie Watson, Michael Firman, Aron Monszpart, Gabriel J. Brostow
- Abstract summary: We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input.
We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data.
We find that a surprisingly low bar for spatial coverage of training scenes is required.
- Score: 32.57664001590537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the shape of a scene from a single color image is a formidable
computer vision task. However, most methods aim to predict the geometry of
surfaces that are visible to the camera, which is of limited use when planning
paths for robots or augmented reality agents. Such agents can only move when
grounded on a traversable surface, which we define as the set of classes which
humans can also walk over, such as grass, footpaths and pavement. Models which
predict beyond the line of sight often parameterize the scene with voxels or
meshes, which can be expensive to use in machine learning frameworks.
We introduce a model to predict the geometry of both visible and occluded
traversable surfaces, given a single RGB image as input. We learn from stereo
video sequences, using camera poses, per-frame depth and semantic segmentation
to form training data, which is used to supervise an image-to-image network. We
train models from the KITTI driving dataset, the indoor Matterport dataset, and
from our own casually captured stereo footage. We find that a surprisingly low
bar for spatial coverage of training scenes is required. We validate our
algorithm against a range of strong baselines, and include an assessment of our
predictions for a path-planning task.
Related papers
- PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot [3.387892563308912]
We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network.
We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.
arXiv Detail & Related papers (2024-04-07T17:31:53Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z) - D-NeRF: Neural Radiance Fields for Dynamic Scenes [72.75686949608624]
We introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain.
D-NeRF reconstructs images of objects under rigid and non-rigid motions from a camera moving around the scene.
We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.
arXiv Detail & Related papers (2020-11-27T19:06:50Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.