Related papers: Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

URL: http://arxiv.org/abs/2406.04343v1
Date: Thu, 6 Jun 2024 17:59:56 GMT
Title: Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Authors: Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi,
Abstract summary: Flash3D is a method for scene reconstruction and novel view synthesis from a single image. For generalisability, we start from a "foundation" model for monocular depth estimation. For efficiency, we base this extension on feed-forward Gaussian Splatting.
Score: 80.48452783328995
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Code, models, demo, and more results are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.

Related papers

GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency. Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z)
SfM-Free 3D Gaussian Splatting via Hierarchical Training [42.85362760049813]
We propose a novel SfM-Free 3DGS (SFGS) method for video input, eliminating the need for known camera poses and SfM preprocessing. Our approach introduces a hierarchical training strategy that trains and merges multiple 3D Gaussian representations into a single, unified 3DGS model. Experimental results reveal that our approach significantly surpasses state-of-the-art SfM-free novel view synthesis methods.
arXiv Detail & Related papers (2024-12-02T14:39:06Z)
ZeroGS: Training 3D Gaussian Splatting from Unposed Images [62.34149221132978]
We propose ZeroGS to train 3DGS from hundreds of unposed and unordered images. Our method leverages a pretrained foundation model as the neural scene representation. Our method recovers more accurate camera poses than state-of-the-art pose-free NeRF/3DGS methods.
arXiv Detail & Related papers (2024-11-24T11:20:48Z)
Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting [0.0]
3D Gaussian splatting has surpassed neural radiance field methods in novel view synthesis. It produces a high-quality rendering with a lot of input views, but its performance drops significantly when only a few views are available. We propose a depth-aware Gaussian splatting method for few-shot novel view synthesis.
arXiv Detail & Related papers (2024-10-14T20:42:30Z)
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs [29.669534899109028]
We introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.
arXiv Detail & Related papers (2024-08-25T18:27:20Z)
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view. The model learns to generate 3D objects represented by sets of GS ellipsoids. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z)
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z)
Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images. We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS. On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.