Related papers: Footprints and Free Space from a Single Color Image

Footprints and Free Space from a Single Color Image

URL: http://arxiv.org/abs/2004.06376v1
Date: Tue, 14 Apr 2020 09:29:17 GMT
Title: Footprints and Free Space from a Single Color Image
Authors: Jamie Watson, Michael Firman, Aron Monszpart, Gabriel J. Brostow
Abstract summary: We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input. We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data. We find that a surprisingly low bar for spatial coverage of training scenes is required.
Score: 32.57664001590537
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the shape of a scene from a single color image is a formidable computer vision task. However, most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Such agents can only move when grounded on a traversable surface, which we define as the set of classes which humans can also walk over, such as grass, footpaths and pavement. Models which predict beyond the line of sight often parameterize the scene with voxels or meshes, which can be expensive to use in machine learning frameworks. We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input. We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data, which is used to supervise an image-to-image network. We train models from the KITTI driving dataset, the indoor Matterport dataset, and from our own casually captured stereo footage. We find that a surprisingly low bar for spatial coverage of training scenes is required. We validate our algorithm against a range of strong baselines, and include an assessment of our predictions for a path-planning task.

Related papers

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction. Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z)
PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot [3.387892563308912]
We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.
arXiv Detail & Related papers (2024-04-07T17:31:53Z)
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints. We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z)
One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics. Each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z)
Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity. We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z)
Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps. Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion. This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z)
D-NeRF: Neural Radiance Fields for Dynamic Scenes [72.75686949608624]
We introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain. D-NeRF reconstructs images of objects under rigid and non-rigid motions from a camera moving around the scene. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.
arXiv Detail & Related papers (2020-11-27T19:06:50Z)
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras. Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid. We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.