SparseLGS: Sparse View Language Embedded Gaussian Splatting
- URL: http://arxiv.org/abs/2412.02245v2
- Date: Wed, 04 Dec 2024 12:16:16 GMT
- Title: SparseLGS: Sparse View Language Embedded Gaussian Splatting
- Authors: Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang,
- Abstract summary: We propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images.
Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs.
Experimental results show that SparseLGS achieves comparable quality when reconstructing semantic fields with fewer inputs.
- Score: 49.187761358726675
- License:
- Abstract: Recently, several studies have combined Gaussian Splatting to obtain scene representations with language embeddings for open-vocabulary 3D scene understanding. While these methods perform well, they essentially require very dense multi-view inputs, limiting their applicability in real-world scenarios. In this work, we propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images. Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs, and a three-step region matching approach to address the multi-view semantic inconsistency problem, which is especially important for sparse inputs. Different from directly learning high-dimensional CLIP features, we extract low-dimensional information and build bijections to avoid excessive learning and storage costs. We introduce a reconstruction loss during semantic training to improve Gaussian positions and shapes. To the best of our knowledge, we are the first to address the 3D semantic field problem with sparse pose-free inputs. Experimental results show that SparseLGS achieves comparable quality when reconstructing semantic fields with fewer inputs (3-4 views) compared to previous SOTA methods with dense input. Besides, when using the same sparse input, SparseLGS leads significantly in quality and heavily improves the computation speed (5$\times$speedup). Project page: https://ustc3dv.github.io/SparseLGS
Related papers
- SLGaussian: Fast Language Gaussian Splatting in Sparse Views [15.0280871846496]
We propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints.
SLGaussian efficiently embeds language information in 3D space, offering a robust solution for accurate 3D scene understanding under sparse view conditions.
arXiv Detail & Related papers (2024-12-11T12:18:30Z) - SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images [125.66499135980344]
We propose SparseGrasp, a novel open-vocabulary robotic grasping system.
SparseGrasp operates efficiently with sparse-view RGB images and handles scene updates fastly.
We show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability.
arXiv Detail & Related papers (2024-12-03T03:56:01Z) - Occam's LGS: A Simple Approach for Language Gaussian Splatting [57.00354758206751]
We show that sophisticated techniques for language-grounded 3D Gaussian Splatting are simply unnecessary.
We apply Occam's razor to the task at hand and perform weighted multi-view feature aggregation.
Our results offer us state-of-the-art results with a speed-up of two orders of magnitude.
arXiv Detail & Related papers (2024-12-02T18:50:37Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting.
Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians.
We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z) - Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding [2.517953665531978]
We introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks.
Our representation achieves the best visual quality and language querying accuracy across current language-embedded representations.
arXiv Detail & Related papers (2023-11-30T11:50:07Z) - Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query.
We propose to leverage weakly supervised annotations to learn the 3D visual grounding model.
We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z) - Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs.
They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion.
For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.