WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
- URL: http://arxiv.org/abs/2509.05296v1
- Date: Fri, 05 Sep 2025 17:59:47 GMT
- Title: WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
- Authors: Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He,
- Abstract summary: WinT3R is a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps.<n>We introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window.<n>We leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation.
- Score: 54.93856767365114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.
Related papers
- OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects [58.38338242973447]
OnlineSplatter is a novel framework generating high-quality, object-centric 3D Gaussians directly from RGB frames.<n>Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field.<n>Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys.
arXiv Detail & Related papers (2025-10-23T14:37:25Z) - BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images [21.811586185200706]
This paper addresses the challenge of reconstructing vehicles from sparse-view inputs.<n>We leverage depth maps and a robust pose estimation architecture to synthesize novel views.<n>We present a novel dataset featuring both synthetic and real-world public transportation vehicles.
arXiv Detail & Related papers (2025-07-16T10:04:35Z) - Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass [68.78222900840132]
We propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel.<n>Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation.
arXiv Detail & Related papers (2025-01-23T18:59:55Z) - VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment [54.66217340264935]
VideoLifter is a novel video-to-3D pipeline that leverages a local-to-global strategy on a fragment basis.<n>It significantly accelerates the reconstruction process, reducing training time by over 82% while holding better visual quality than current SOTA methods.
arXiv Detail & Related papers (2025-01-03T18:52:36Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image.<n>We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics.<n>By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z) - KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints.
With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z) - DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model [31.43307762723943]
The flat lensless camera design reduces the camera size and weight significantly.
The image is recovered from the raw sensor measurements using a reconstruction algorithm.
We propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction.
arXiv Detail & Related papers (2024-08-14T13:20:52Z) - Unrolled Primal-Dual Networks for Lensless Cameras [0.45880283710344055]
We show that learning a supervised primal-dual reconstruction method results in image quality matching state of the art in the literature.
This improvement stems from our finding that embedding learnable forward and adjoint models in a learned primal-dual optimization framework can even improve the quality of reconstructed images.
arXiv Detail & Related papers (2022-03-08T19:21:39Z) - Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure
Reconstruction from an RGB Video [90.93141123721713]
Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world.
It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion.
We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera.
arXiv Detail & Related papers (2020-05-07T10:39:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.