TTT3R: 3D Reconstruction as Test-Time Training
- URL: http://arxiv.org/abs/2509.26645v3
- Date: Thu, 16 Oct 2025 11:37:35 GMT
- Title: TTT3R: 3D Reconstruction as Test-Time Training
- Authors: Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen,
- Abstract summary: We revisit the 3D reconstruction foundation models from a Test-Time Training perspective.<n>We leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate.<n>This training-free intervention, termed TTT3R, substantially improves length generalization.
- Score: 69.51086319339662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R
Related papers
- LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory [97.14005794889134]
We present LoGeR, a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization.<n>LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning.<n>This memory architecture enables LoGeR to be trained on sequences of 128 frames, and generalize up to thousands of frames during inference.
arXiv Detail & Related papers (2026-03-03T18:55:37Z) - RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations [70.83499963694238]
RnG (Reconstruction and Generation) is a novel feed-forward Transformer that unifies reconstruction and generation.<n>It reconstructs visible geometry and generates plausible, coherent unseen geometry and appearance.<n>Our method achieves state-of-the-art performance in both generalizable 3D reconstruction and novel view generation.
arXiv Detail & Related papers (2026-03-01T17:25:32Z) - tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction [47.43504457409347]
tttLRM is a novel large 3D reconstruction model that leverages a Test-Time Training layer.<n>Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer.<n>Online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations.
arXiv Detail & Related papers (2026-02-23T18:59:45Z) - STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer [72.88105562624838]
We present STream3R, a novel approach to 3D reconstruction that reformulates pointmap prediction as a decoder-only Transformer problem.<n>By learning geometric priors from large-scale 3D datasets, STream3R generalizes well to diverse and challenging scenarios.<n>Our results underscore the potential of causal Transformer models for online 3D perception, paving the way for real-time 3D understanding in streaming environments.
arXiv Detail & Related papers (2025-08-14T17:58:05Z) - Test3R: Learning to Reconstruct 3D at Test Time [58.0912500917036]
Test3R is a surprisingly simple test-time learning technique that significantly boosts geometric accuracy.<n>Our technique significantly outperforms previous state-of-the-art methods on the 3D reconstruction and multi-view depth estimation tasks.
arXiv Detail & Related papers (2025-06-16T17:56:22Z) - Beyond Existance: Fulfill 3D Reconstructed Scenes with Pseudo Details [12.167127919679022]
We introduce a new training method that integrates diffusion models and multi-scale training using pseudo-ground-truth data.<n>Our method achieves state-of-the-art performance across various benchmarks and extends the capabilities of 3D reconstruction beyond training datasets.
arXiv Detail & Related papers (2025-03-06T02:46:10Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Least Redundant Gated Recurrent Neural Network [0.0]
We introduce a recurrent neural architecture called Deep Memory Update (DMU)
It is based on updating the previous memory state with a deep transformation of the lagged state and the network input.
Its training is stable and fast due to relating its learning rate to the size of the module.
arXiv Detail & Related papers (2021-05-28T20:24:00Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - Procrustean Regression Networks: Learning 3D Structure of Non-Rigid
Objects from 2D Annotations [42.476537776831314]
We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects.
The proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets.
arXiv Detail & Related papers (2020-07-21T17:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.