Related papers: OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

URL: http://arxiv.org/abs/2506.05204v1
Date: Thu, 05 Jun 2025 16:17:18 GMT
Title: OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View
Authors: Yanbo Wang, Ziyi Wang, Wenzhao Zheng, Jie Zhou, Jiwen Lu,
Abstract summary: We propose OGGSplat, an open Gaussian growing method that expands the field-of-view in generalizable 3D reconstruction.<n>Our key insight is that the semantic attributes of open Gaussians provide strong priors for image extrapolation.<n> OGGSplat also demonstrates promising semantic-aware scene reconstruction capabilities when provided with two view images captured directly from a smartphone camera.
Score: 74.58230239274123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing semantic-aware 3D scenes from sparse views is a challenging yet essential research direction, driven by the demands of emerging applications such as virtual reality and embodied AI. Existing per-scene optimization methods require dense input views and incur high computational costs, while generalizable approaches often struggle to reconstruct regions outside the input view cone. In this paper, we propose OGGSplat, an open Gaussian growing method that expands the field-of-view in generalizable 3D reconstruction. Our key insight is that the semantic attributes of open Gaussians provide strong priors for image extrapolation, enabling both semantic consistency and visual plausibility. Specifically, once open Gaussians are initialized from sparse views, we introduce an RGB-semantic consistent inpainting module applied to selected rendered views. This module enforces bidirectional control between an image diffusion model and a semantic diffusion model. The inpainted regions are then lifted back into 3D space for efficient and progressive Gaussian parameter optimization. To evaluate our method, we establish a Gaussian Outpainting (GO) benchmark that assesses both semantic and generative quality of reconstructed open-vocabulary scenes. OGGSplat also demonstrates promising semantic-aware scene reconstruction capabilities when provided with two view images captured directly from a smartphone camera.

Related papers

3D Gaussian Splatting for Fine-Detailed Surface Reconstruction in Large-Scale Scene [9.344622188779308]
This paper proposes a novel solution to reconstruct large-scale surfaces with fine details, supervised by full-sized images.<n>We introduce a coarse-to-fine strategy to reconstruct a coarse model efficiently, followed by adaptive scene partitioning and sub-scene refining.<n>Experiments were conducted on the publicly available dataset GauU-Scene V2, which was captured using unmanned aerial vehicles.
arXiv Detail & Related papers (2025-06-21T08:41:28Z)
ODG: Occupancy Prediction Using Dual Gaussians [38.9869091446875]
Occupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment.<n>Existing methods either adopt dense grids as scene representation, or learn the entire scene using a single set of sparse queries.<n>We present ODG, a hierarchical dual sparse Gaussian representation to effectively capture complex scene dynamics.
arXiv Detail & Related papers (2025-06-11T06:03:03Z)
UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images [43.40816438003861]
We propose a feed-forward model that unifies 3D scene and semantic field reconstruction.<n>Our UniForward can reconstruct 3D scenes and the corresponding semantic fields in real time from only sparse-view images.<n> Experiments on novel view synthesis and novel view segmentation demonstrate that our method achieves state-of-the-art performances.
arXiv Detail & Related papers (2025-06-11T04:01:21Z)
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation.<n>We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images.<n>To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z)
MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [84.07233691641193]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction.<n>To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization.<n>Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z)
NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images. We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z)
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction [9.953394373473621]
Sparse-view 3D reconstruction is a major challenge in computer vision.<n>We propose a self-augmented two-stage Gaussian splatting framework enhanced with structural masks for sparse-view 3D reconstruction.<n>Our approach achieves state-of-the-art performance in perceptual quality and multi-view consistency with sparse inputs.
arXiv Detail & Related papers (2024-08-09T03:09:22Z)
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians. We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.