LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
- URL: http://arxiv.org/abs/2507.07136v2
- Date: Wed, 08 Oct 2025 03:25:34 GMT
- Title: LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
- Authors: Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister,
- Abstract summary: LangSplatV2 achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images.<n>LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster.
- Score: 60.933341835615465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 $\times$ speedup and a 47 $\times$ boost over LangSplat respectively, along with improved query accuracy. LangSplat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, LangSplat does not yet achieve real-time inference performance (8.2 FPS), even with advanced A100 GPUs, severely limiting its broader application. In this paper, we first conduct a detailed time analysis of LangSplat, identifying the heavyweight decoder as the primary speed bottleneck. Our solution, LangSplatV2 assumes that each Gaussian acts as a sparse code within a global dictionary, leading to the learning of a 3D sparse coefficient field that entirely eliminates the need for a heavyweight decoder. By leveraging this sparsity, we further propose an efficient sparse coefficient splatting method with CUDA optimization, rendering high-dimensional feature maps at high quality while incurring only the time cost of splatting an ultra-low-dimensional feature. Our experimental results demonstrate that LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster. Codes and demos are available at our project page: https://langsplat-v2.github.io.
Related papers
- Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting [52.18697134979677]
Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS)<n>Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality.<n>We introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity.<n>Our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate 43.7x speedup on 512-D feature maps.
arXiv Detail & Related papers (2025-12-24T04:16:18Z) - Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression [0.0]
We introduce Gen-LangSplat, that replaces the scene-wise autoencoder with a generalized autoencoder, pre-trained extensively on the large-scale ScanNet dataset.<n>This architectural shift enables the use of a fixed, compact latent space for language features across any new scene without any scene-specific training.<n>Our results demonstrate that generalized embeddings can efficiently and accurately support open-vocabulary querying in novel 3D scenes.
arXiv Detail & Related papers (2025-10-27T02:13:38Z) - LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos [24.61106294159454]
LongSplat addresses challenges in novel view synthesis (NVS) from casually captured long videos characterized by irregular camera motion, unknown camera poses, and expansive scenes.<n>LongSplat is a robust unposed 3D Gaussian Splatting framework featuring: (1) Incremental Joint Optimization that concurrently optimize camera poses and 3D Gaussians to avoid local minima and ensure global consistency; (2) a robust Pose Estimation Module leveraging learned 3D priors; and (3) an efficient Octree Anchor Formation mechanism that converts dense point clouds into anchors based on spatial density.
arXiv Detail & Related papers (2025-08-19T17:59:56Z) - SLAG: Scalable Language-Augmented Gaussian Splatting [19.643023058839603]
Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining.<n>Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions.<n>We introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes.
arXiv Detail & Related papers (2025-05-12T23:32:24Z) - Occam's LGS: An Efficient Approach for Language Gaussian Splatting [57.00354758206751]
We show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary.<n>We apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique.
arXiv Detail & Related papers (2024-12-02T18:50:37Z) - FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally [66.28517576128381]
This study addresses the challenge of accurately segmenting 3D Gaussian Splatting from 2D masks.
We propose a straightforward yet globally optimal solver for 3D-GS segmentation.
Our method completes within 30 seconds, about 50$times$ faster than the best existing methods.
arXiv Detail & Related papers (2024-09-12T17:58:13Z) - LangSplat: 3D Language Gaussian Splatting [42.16849512832556]
LangSplat constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces.
LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin.
arXiv Detail & Related papers (2023-12-26T15:14:37Z) - 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering [103.32717396287751]
We propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes.
A neuralvoxel encoding algorithm inspired by HexPlane is proposed to efficiently build features from 4D neural voxels.
Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$times$800 resolution on an 3090 GPU.
arXiv Detail & Related papers (2023-10-12T17:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.