Splat Feature Solver
- URL: http://arxiv.org/abs/2508.12216v1
- Date: Sun, 17 Aug 2025 03:13:06 GMT
- Title: Splat Feature Solver
- Authors: Butian Xiong, Rong Liu, Kenneth Xu, Meida Chen, Andrew Feng,
- Abstract summary: We present a kernel- and feature-agnostic formulation of the feature lifting problem as a sparse linear inverse problem.<n>We introduce two complementary regularization strategies to stabilize the solution and enhance semantic fidelity.<n>Our approach achieves state-of-the-art performance on open-vocabulary 3D segmentation benchmarks.
- Score: 2.385329252971734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature lifting has emerged as a crucial component in 3D scene understanding, enabling the attachment of rich image feature descriptors (e.g., DINO, CLIP) onto splat-based 3D representations. The core challenge lies in optimally assigning rich general attributes to 3D primitives while addressing the inconsistency issues from multi-view images. We present a unified, kernel- and feature-agnostic formulation of the feature lifting problem as a sparse linear inverse problem, which can be solved efficiently in closed form. Our approach admits a provable upper bound on the global optimal error under convex losses for delivering high quality lifted features. To address inconsistencies and noise in multi-view observations, we introduce two complementary regularization strategies to stabilize the solution and enhance semantic fidelity. Tikhonov Guidance enforces numerical stability through soft diagonal dominance, while Post-Lifting Aggregation filters noisy inputs via feature clustering. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on open-vocabulary 3D segmentation benchmarks, outperforming training-based, grouping-based, and heuristic-forward baselines while producing the lifted features in minutes. Code is available at \href{https://github.com/saliteta/splat-distiller.git}{\textbf{github}}. We also have a \href{https://splat-distiller.pages.dev/}
Related papers
- COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation [58.47973015036709]
3D pose estimation from sparse multi-views is a critical task for action recognition, sports analysis, and human-robot interaction.<n>We propose COMPOSE, a novel framework that formulates multi-view pose correspondence matching as a hypergraph problem.<n> COMPOSE achieves improvements of up to 23% in average precision over previous optimization-based methods and up to 11% over self-supervised end-to-end learned methods.
arXiv Detail & Related papers (2026-01-14T18:50:17Z) - UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning [6.502142457981839]
3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis.<n>Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmentation for better scene understanding.<n>Key challenge is the inconsistency of 2D instance labels across views, leading to poor 3D predictions.<n>We propose a unified framework that merges these steps, reducing training time and improving performance by introducing a learnable feature embedding for segmentation in Gaussian primitives.
arXiv Detail & Related papers (2025-12-31T10:20:01Z) - OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion [89.98812408058336]
We introduce textbfOpenInsGaussian, an textbfOpen-vocabulary textbfInstance textbfGaussian segmentation framework with Context-aware Cross-view Fusion.<n>OpenInsGaussian achieves state-of-the-art results in open-vocabulary 3D Gaussian segmentation, outperforming existing baselines by a large margin.
arXiv Detail & Related papers (2025-10-21T03:24:12Z) - Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [86.15347226865826]
We design a new end-to-end object-aware lifting approach, named Unified-Lift.<n>We augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information.<n>We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms.
arXiv Detail & Related papers (2025-03-18T08:42:23Z) - NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features [50.212836834889146]
We propose an efficient and novel visual localization approach based on the neural implicit map with complementary features.<n>Specifically, to enforce geometric constraints and reduce storage requirements, we implicitly learn a 3D keypoint descriptor field.<n>To further address the semantic ambiguity of descriptors, we introduce additional semantic contextual feature fields.
arXiv Detail & Related papers (2025-03-08T08:04:27Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Stochastic Nested Compositional Bi-level Optimization for Robust Feature
Learning [11.236838268731804]
We develop and analyze algorithms for solving nested bi-level optimization problems.
Our proposed algorithm does not rely on matrix complexity or mini-batches.
arXiv Detail & Related papers (2023-07-11T15:52:04Z) - Large-scale Point Cloud Registration Based on Graph Matching
Optimization [30.92028761652611]
We propose a underlineGraph underlineMatching underlineOptimization based underlineNetwork.
The proposed method has been evaluated on the 3DMatch/3DLoMatch benchmarks and the KITTI benchmark.
arXiv Detail & Related papers (2023-02-12T03:29:35Z) - IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding
Alignment [58.8330387551499]
We formulate the problem as estimation of point-wise trajectories (i.e., smooth curves)
We propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency.
We demonstrate the effectiveness of our method on various point cloud sequences and observe large improvement over state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-03-22T10:14:08Z) - Multiway Non-rigid Point Cloud Registration via Learned Functional Map
Synchronization [105.14877281665011]
We present SyNoRiM, a novel way to register multiple non-rigid shapes by synchronizing the maps relating learned functions defined on the point clouds.
We demonstrate via extensive experiments that our method achieves a state-of-the-art performance in registration accuracy.
arXiv Detail & Related papers (2021-11-25T02:37:59Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - Bi-level Feature Alignment for Versatile Image Translation and
Manipulation [88.5915443957795]
Generative adversarial networks (GANs) have achieved great success in image translation and manipulation.
High-fidelity image generation with faithful style control remains a grand challenge in computer vision.
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance.
arXiv Detail & Related papers (2021-07-07T05:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.