CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation
- URL: http://arxiv.org/abs/2511.17904v1
- Date: Sat, 22 Nov 2025 03:42:49 GMT
- Title: CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation
- Authors: Yuhang Ming, Chenxin Fang, Xingyuan Yu, Fan Zhang, Weichen Dai, Wanzeng Kong, Guofeng Zhang,
- Abstract summary: CUS-GS is a compact unified structured Gaussian Splatting representation.<n>We propose a feature-aware significance evaluation strategy to guide anchor growing and pruning.<n>CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters.
- Score: 16.85102888388904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting representation, which connects multimodal semantic features with structured 3D geometry. Specifically, we design a voxelized anchor structure that constructs a spatial scaffold, while extracting multimodal semantic features from a set of foundation models (e.g., CLIP, DINOv2, SEEM). Moreover, we introduce a multimodal latent feature allocation mechanism to unify appearance, geometry, and semantics across heterogeneous feature spaces, ensuring a consistent representation across multiple foundation models. Finally, we propose a feature-aware significance evaluation strategy to dynamically guide anchor growing and pruning, effectively removing redundant or invalid anchors while maintaining semantic integrity. Extensive experiments show that CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters - an order of magnitude smaller than the closest rival at 35M - highlighting the excellent trade off between performance and model efficiency of the proposed framework.
Related papers
- SSR: Pushing the Limit of Spatial Intelligence with Structured Scene Reasoning [30.87517633729756]
SSR is a framework designed for Structured Scene Reasoning.<n>It seamlessly integrates 2D and 3D representations via a lightweight alignment mechanism.<n>It achieves state-of-the-art performance on multiple spatial intelligence benchmarks.
arXiv Detail & Related papers (2026-02-28T02:05:35Z) - CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval [54.15776146365823]
Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text.<n>We propose CSMCIR, a unified representation framework that achieves efficient query-target alignment through three synergistic components.
arXiv Detail & Related papers (2026-01-07T09:21:38Z) - SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation [114.57192386025373]
SegSplat is a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding.<n>This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments.
arXiv Detail & Related papers (2025-11-23T10:26:38Z) - Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Network [31.003374497881968]
Grassmannian manifold offers a powerful carrier for geometric representation learning.<n>We propose a topology-driven multi-subspace fusion framework that enables adaptive subspace collaboration on the Grassmannian.<n>Our work advances geometric deep learning and adapts the proven multi-channel interaction philosophy of Euclidean networks to non-Euclidean domains.
arXiv Detail & Related papers (2025-11-09T10:33:13Z) - Complementary Information Guided Occupancy Prediction via Multi-Level Representation Fusion [73.11061598576798]
Camera-based occupancy prediction is a mainstream approach for 3D perception in autonomous driving.<n>textbfCIGOcc is a two-stage occupancy prediction framework based on multi-level representation fusion.<n>textbfCIGOcc extracts segmentation, graphics, and depth features from an input image and introduces a deformable multi-level fusion mechanism.
arXiv Detail & Related papers (2025-10-15T06:37:33Z) - Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z) - Hierarchical Neural Semantic Representation for 3D Semantic Correspondence [72.8101601086805]
We design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features.<n>Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature.<n>Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories.
arXiv Detail & Related papers (2025-09-22T07:23:07Z) - FHGS: Feature-Homogenized Gaussian Splatting [7.238124816235862]
$textitFHGS$ is a novel 3D feature fusion framework inspired by physical models.<n>It can achieve high-precision mapping of arbitrary 2D features from pre-trained models to 3D scenes while preserving the real-time rendering efficiency of 3DGS.
arXiv Detail & Related papers (2025-05-25T14:08:49Z) - econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [56.85804719947]
We propose econSG for open-vocabulary semantic segmentation with 3DGS.<n>Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods.
arXiv Detail & Related papers (2025-04-08T13:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.