FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views
- URL: http://arxiv.org/abs/2512.17541v1
- Date: Fri, 19 Dec 2025 13:04:13 GMT
- Title: FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views
- Authors: Qijian Tian, Xin Tan, Jiayu Ying, Xuhong Wang, Yuan Xie, Lizhuang Ma,
- Abstract summary: FLEG is a feed-forward network that reconstructs language-embedded 3D Gaussians from any views.<n>We propose a 3D-annotation-free training framework for 2D-to-3D lifting from arbitrary uncalibrated and unposed multi-view images.
- Score: 52.02871618456553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present FLEG, a feed-forward network that reconstructs language-embedded 3D Gaussians from any views. Previous straightforward solutions combine feed-forward reconstruction with Gaussian heads but suffer from fixed input views and insufficient 3D training data. In contrast, we propose a 3D-annotation-free training framework for 2D-to-3D lifting from arbitrary uncalibrated and unposed multi-view images. Since the framework does not require 3D annotations, we can leverage large-scale video data with easily obtained 2D instance information to enrich semantic embedding. We also propose an instance-guided contrastive learning to align 2D semantics with the 3D representations. In addition, to mitigate the high memory and computational cost of dense views, we further propose a geometry-semantic hierarchical sparsification strategy. Our FLEG efficiently reconstructs language-embedded 3D Gaussian representation in a feed-forward manner from arbitrary sparse or dense views, jointly producing accurate geometry, high-fidelity appearance, and language-aligned semantics. Extensive experiments show that it outperforms existing methods on various related tasks. Project page: https://fangzhou2000.github.io/projects/fleg.
Related papers
- GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting [74.56128224977279]
We present GALA, a novel framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS)<n>GALA distills a scene-specific 3D instance feature field via self-supervised contrastive learning.<n>It supports seamless 2D and 3D open-vocabulary queries and reduces memory consumption by avoiding per-Gaussian high-dimensional feature learning.
arXiv Detail & Related papers (2025-08-19T21:26:49Z) - Tackling View-Dependent Semantics in 3D Language Gaussian Splatting [80.88015191411714]
LaGa establishes cross-view semantic connections by decomposing the 3D scene into objects.<n>It constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics.<n>Under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset.
arXiv Detail & Related papers (2025-05-30T16:06:32Z) - PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding [8.72555461868951]
3D Gaussian Splatting (3DGS) has shown encouraging performance for open vocabulary scene understanding tasks.<n>Previous methods cannot distinguish 3D instance-level information, which usually predicts a heatmap between the scene feature and text query.<n>We propose PanoGS, a novel and effective 3D panoptic open vocabulary scene understanding approach.
arXiv Detail & Related papers (2025-03-23T15:27:29Z) - Unifying 2D and 3D Vision-Language Understanding [85.84054120018625]
We introduce UniVLG, a unified architecture for 2D and 3D vision-language learning.<n>UniVLG bridges the gap between existing 2D-centric models and the rich 3D sensory data available in embodied systems.
arXiv Detail & Related papers (2025-03-13T17:56:22Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation.<n>We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images.<n>To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z) - MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model [34.245635412589806]
MeshFormer is a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision.
It can be integrated with 2D diffusion models to enable fast single-image-to-3D and text-to-3D tasks.
arXiv Detail & Related papers (2024-08-19T17:55:17Z) - Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting.
Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians.
We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.