TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction
- URL: http://arxiv.org/abs/2603.00697v1
- Date: Sat, 28 Feb 2026 15:13:13 GMT
- Title: TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction
- Authors: Yihui Li, Chengxin Lv, Zichen Tang, Hongyu Yang, Di Huang,
- Abstract summary: TokenSplat is a feed-forward framework for joint 3D Gaussian reconstruction and camera pose estimation.<n>At its core, TokenSplat introduces a Token-aligned Gaussian Prediction module.<n>It aggregates multi-scale contextual features to enable long-range cross-view reasoning.
- Score: 45.41545304485825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present TokenSplat, a feed-forward framework for joint 3D Gaussian reconstruction and camera pose estimation from unposed multi-view images. At its core, TokenSplat introduces a Token-aligned Gaussian Prediction module that aligns semantically corresponding information across views directly in the feature space. Guided by coarse token positions and fusion confidence, it aggregates multi-scale contextual features to enable long-range cross-view reasoning and reduce redundancy from overlapping Gaussians. To further enhance pose robustness and disentangle viewpoint cues from scene semantics, TokenSplat employs learnable camera tokens and an Asymmetric Dual-Flow Decoder (ADF-Decoder) that enforces directionally constrained communication between camera and image tokens. This maintains clean factorization within a feed-forward architecture, enabling coherent reconstruction and stable pose estimation without iterative refinement. Extensive experiments demonstrate that TokenSplat achieves higher reconstruction fidelity and novel-view synthesis quality in pose-free settings, and significantly improves pose estimation accuracy compared to prior pose-free methods. Project page: https://kidleyh.github.io/tokensplat/.
Related papers
- AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [68.94737256959661]
AnySplat is a feed forward network for novel view synthesis from uncalibrated image collections.<n>A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance.<n>In extensive zero shot evaluations, AnySplat matches the quality of pose aware baselines in both sparse and dense view scenarios.
arXiv Detail & Related papers (2025-05-29T17:49:56Z) - VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames [8.746291192336056]
We present VicaSplat, a novel framework for joint 3D Gaussians reconstruction and camera pose estimation.<n>The core of our method lies in a novel transformer-based network architecture.
arXiv Detail & Related papers (2025-03-13T11:56:05Z) - "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.