OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
- URL: http://arxiv.org/abs/2412.16604v1
- Date: Sat, 21 Dec 2024 12:33:08 GMT
- Title: OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
- Authors: Suyoung Lee, Jaeyoung Chung, Kihoon Kim, Jaeyoo Huh, Gunhee Lee, Minsoo Lee, Kyoung Mu Lee,
- Abstract summary: We propose OmniSplat, a pioneering work for fast feed-forward 3DGS generation from a few omnidirectional images.<n>We demonstrate higher reconstruction accuracy than existing feed-forward networks trained on perspective images.
- Score: 44.255563018074575
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Feed-forward 3D Gaussian Splatting (3DGS) models have gained significant popularity due to their ability to generate scenes immediately without needing per-scene optimization. Although omnidirectional images are getting more popular since they reduce the computation for image stitching to composite a holistic scene, existing feed-forward models are only designed for perspective images. The unique optical properties of omnidirectional images make it difficult for feature encoders to correctly understand the context of the image and make the Gaussian non-uniform in space, which hinders the image quality synthesized from novel views. We propose OmniSplat, a pioneering work for fast feed-forward 3DGS generation from a few omnidirectional images. We introduce Yin-Yang grid and decompose images based on it to reduce the domain gap between omnidirectional and perspective images. The Yin-Yang grid can use the existing CNN structure as it is, but its quasi-uniform characteristic allows the decomposed image to be similar to a perspective image, so it can exploit the strong prior knowledge of the learned feed-forward network. OmniSplat demonstrates higher reconstruction accuracy than existing feed-forward networks trained on perspective images. Furthermore, we enhance the segmentation consistency between omnidirectional images by leveraging attention from the encoder of OmniSplat, providing fast and clean 3DGS editing results.
Related papers
- VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames [8.746291192336056]
We present VicaSplat, a novel framework for joint 3D Gaussians reconstruction and camera pose estimation.
The core of our method lies in a novel transformer-based network architecture.
arXiv Detail & Related papers (2025-03-13T11:56:05Z) - Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images [52.48351378615057]
textitSplatter-360 is a novel end-to-end generalizable 3DGS framework to handle wide-baseline panoramic images.
We introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images.
This enables robust 3D-aware feature representations and real-time rendering capabilities.
arXiv Detail & Related papers (2024-12-09T06:58:31Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs [29.669534899109028]
We introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs.
Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information.
Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.
arXiv Detail & Related papers (2024-08-25T18:27:20Z) - FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis.
We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z) - OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting [27.543561055868697]
Current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images.
We present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction.
arXiv Detail & Related papers (2024-04-04T05:10:26Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.