Related papers: SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

URL: http://arxiv.org/abs/2505.23044v2
Date: Fri, 10 Oct 2025 10:40:55 GMT
Title: SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images
Authors: Yu Sheng, Jiajun Deng, Xinran Zhang, Yu Zhang, Bei Hua, Yanyong Zhang, Jianmin Ji,
Abstract summary: We introduce textbfSpatialSplat, a feedforward framework that produces redundancy-aware Gaussians.<n>We demonstrate a remarkable 60% reduction in scene representation parameters while achieving superior performance over state-of-the-art methods.
Score: 45.82938095959514
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A major breakthrough in 3D reconstruction is the feedforward paradigm to generate pixel-wise 3D points or Gaussian primitives from sparse, unposed images. To further incorporate semantics while avoiding the significant memory and storage costs of high-dimensional semantic features, existing methods extend this paradigm by associating each primitive with a compressed semantic feature vector. However, these methods have two major limitations: (a) the naively compressed feature compromises expressiveness, affecting the model's ability to capture fine-grained semantics, and (b) the pixel-wise primitive prediction introduces redundancy in overlapping areas, causing unnecessary memory overhead. To this end, we introduce \textbf{SpatialSplat}, a feedforward framework that produces redundancy-aware Gaussians and capitalizes on a dual-field semantic representation. Particularly, with the insight that primitives within the same instance exhibit high semantic consistency, we decompose the semantic representation into a coarse feature field that encodes uncompressed semantics with minimal primitives, and a fine-grained yet low-dimensional feature field that captures detailed inter-instance relationships. Moreover, we propose a selective Gaussian mechanism, which retains only essential Gaussians in the scene, effectively eliminating redundant primitives. Our proposed Spatialsplat learns accurate semantic information and detailed instances prior with more compact 3D Gaussians, making semantic 3D reconstruction more applicable. We conduct extensive experiments to evaluate our method, demonstrating a remarkable 60\% reduction in scene representation parameters while achieving superior performance over state-of-the-art methods. The code is available at https://github.com/shengyuuu/SpatialSplat.git

Related papers

Prune Wisely, Reconstruct Sharply: Compact 3D Gaussian Splatting via Adaptive Pruning and Difference-of-Gaussian Primitives [14.295266671241004]
3D Gaussian Splatting (3DGS) has enabled real-time rendering with photorealistic quality.<n>3DGS often requires a large number of primitives to achieve high fidelity.<n>We propose an efficient, integrated reconstruction-aware pruning strategy that determines pruning timing and refining intervals.<n>We also introduce a 3D Difference-of-Gaussians primitive that jointly models both positive and negative densities in a single primitive.
arXiv Detail & Related papers (2026-02-27T16:12:58Z)
SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction [5.730573889498275]
We propose SUG-Occ, an explicit Semantics and Uncertainty Guided Sparse Learning Enabled 3D Occupancy Prediction Framework.<n>We first utilize semantic and uncertainty priors to suppress projections from free space during view transformation.<n>We then employ an explicit unsigned distance encoding to enhance geometric consistency, producing a structurally consistent sparse 3D representation.
arXiv Detail & Related papers (2026-01-16T16:07:38Z)
C3G: Learning Compact 3D Representations with 2K Gaussians [55.04010158339562]
Recent approaches use per-pixel 3D Gaussian Splatting for reconstruction, followed by a 2D-to-3D feature lifting stage for scene understanding.<n>We propose C3G, a novel feed-forward framework that estimates compact 3D Gaussians only at essential spatial locations.
arXiv Detail & Related papers (2025-12-03T17:59:05Z)
Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation [83.90109373769614]
3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation.<n>We propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping.<n>We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability.
arXiv Detail & Related papers (2025-11-30T15:51:30Z)
VG3T: Visual Geometry Grounded Gaussian Transformer [18.15986152198467]
VG3T is a novel multi-view feed-forward network that predicts a 3D semantic occupancy via a 3D Gaussian representation.<n>We show a notable 1.7%p improvement in mIoU while using 46% fewer primitives than the previous state-of-the-art on the nuScenes benchmark.
arXiv Detail & Related papers (2025-11-28T07:27:20Z)
SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation [114.57192386025373]
SegSplat is a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding.<n>This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments.
arXiv Detail & Related papers (2025-11-23T10:26:38Z)
SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion [31.116931865374564]
3D Semantic Scene Completion is a challenging yet promising task that aims to infer dense geometric and semantic descriptions of a scene from a single image.<n>We propose SplatSSC, a novel framework that resolves these limitations with a depth-guided initialization strategy and a principled Gaussian aggregator.<n>Our method achieves state-of-the-art performance on the Occ-ScanNet dataset, outperforming prior approaches by over 6.3% in IoU and 4.1% in mIoU.
arXiv Detail & Related papers (2025-08-04T10:09:31Z)
ODG: Occupancy Prediction Using Dual Gaussians [38.9869091446875]
Occupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment.<n>Existing methods either adopt dense grids as scene representation, or learn the entire scene using a single set of sparse queries.<n>We present ODG, a hierarchical dual sparse Gaussian representation to effectively capture complex scene dynamics.
arXiv Detail & Related papers (2025-06-11T06:03:03Z)
UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images [43.40816438003861]
We propose a feed-forward model that unifies 3D scene and semantic field reconstruction.<n>Our UniForward can reconstruct 3D scenes and the corresponding semantic fields in real time from only sparse-view images.<n> Experiments on novel view synthesis and novel view segmentation demonstrate that our method achieves state-of-the-art performances.
arXiv Detail & Related papers (2025-06-11T04:01:21Z)
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View [74.58230239274123]
We propose OGGSplat, an open Gaussian growing method that expands the field-of-view in generalizable 3D reconstruction.<n>Our key insight is that the semantic attributes of open Gaussians provide strong priors for image extrapolation.<n> OGGSplat also demonstrates promising semantic-aware scene reconstruction capabilities when provided with two view images captured directly from a smartphone camera.
arXiv Detail & Related papers (2025-06-05T16:17:18Z)
FHGS: Feature-Homogenized Gaussian Splatting [7.238124816235862]
$textitFHGS$ is a novel 3D feature fusion framework inspired by physical models.<n>It can achieve high-precision mapping of arbitrary 2D features from pre-trained models to 3D scenes while preserving the real-time rendering efficiency of 3DGS.
arXiv Detail & Related papers (2025-05-25T14:08:49Z)
CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation [60.712165339762116]
CompGS++ is a novel framework that leverages compact Gaussian primitives to achieve accurate 3D modeling.<n>Our design is based on the principle of eliminating redundancy both between and within primitives.<n>Our implementation will be made publicly available on GitHub to facilitate further research.
arXiv Detail & Related papers (2025-04-17T15:33:01Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields [73.49548565633123]
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering. Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images. We propose a view framework based on 3D Gaussian Splatting, named MCGS, enabling scene reconstruction from sparse input views.
arXiv Detail & Related papers (2024-10-15T08:39:05Z)
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians. We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z)
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS) We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms. Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z)
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction [48.86083272054711]
latentSplat is a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data.
arXiv Detail & Related papers (2024-03-24T20:48:36Z)
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians. We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)
SwIPE: Efficient and Robust Medical Image Segmentation with Implicit Patch Embeddings [12.79344668998054]
We propose SwIPE (Segmentation with Implicit Patch Embeddings) to enable accurate local boundary delineation and global shape coherence. We show that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters.
arXiv Detail & Related papers (2023-07-23T20:55:11Z)
Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space. We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.