Related papers: TTT3R: 3D Reconstruction as Test-Time Training

TTT3R: 3D Reconstruction as Test-Time Training

URL: http://arxiv.org/abs/2509.26645v3
Date: Thu, 16 Oct 2025 11:37:35 GMT
Title: TTT3R: 3D Reconstruction as Test-Time Training
Authors: Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen,
Abstract summary: We revisit the 3D reconstruction foundation models from a Test-Time Training perspective.<n>We leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate.<n>This training-free intervention, termed TTT3R, substantially improves length generalization.
Score: 69.51086319339662
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R

Related papers

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory [97.14005794889134]
We present LoGeR, a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization.<n>LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning.<n>This memory architecture enables LoGeR to be trained on sequences of 128 frames, and generalize up to thousands of frames during inference.
arXiv Detail & Related papers (2026-03-03T18:55:37Z)
RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations [70.83499963694238]
RnG (Reconstruction and Generation) is a novel feed-forward Transformer that unifies reconstruction and generation.<n>It reconstructs visible geometry and generates plausible, coherent unseen geometry and appearance.<n>Our method achieves state-of-the-art performance in both generalizable 3D reconstruction and novel view generation.
arXiv Detail & Related papers (2026-03-01T17:25:32Z)
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction [47.43504457409347]
tttLRM is a novel large 3D reconstruction model that leverages a Test-Time Training layer.<n>Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer.<n>Online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations.
arXiv Detail & Related papers (2026-02-23T18:59:45Z)
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer [72.88105562624838]
We present STream3R, a novel approach to 3D reconstruction that reformulates pointmap prediction as a decoder-only Transformer problem.<n>By learning geometric priors from large-scale 3D datasets, STream3R generalizes well to diverse and challenging scenarios.<n>Our results underscore the potential of causal Transformer models for online 3D perception, paving the way for real-time 3D understanding in streaming environments.
arXiv Detail & Related papers (2025-08-14T17:58:05Z)
Test3R: Learning to Reconstruct 3D at Test Time [58.0912500917036]
Test3R is a surprisingly simple test-time learning technique that significantly boosts geometric accuracy.<n>Our technique significantly outperforms previous state-of-the-art methods on the 3D reconstruction and multi-view depth estimation tasks.
arXiv Detail & Related papers (2025-06-16T17:56:22Z)
Beyond Existance: Fulfill 3D Reconstructed Scenes with Pseudo Details [12.167127919679022]
We introduce a new training method that integrates diffusion models and multi-scale training using pseudo-ground-truth data.<n>Our method achieves state-of-the-art performance across various benchmarks and extends the capabilities of 3D reconstruction beyond training datasets.
arXiv Detail & Related papers (2025-03-06T02:46:10Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z)
Least Redundant Gated Recurrent Neural Network [0.0]
We introduce a recurrent neural architecture called Deep Memory Update (DMU) It is based on updating the previous memory state with a deep transformation of the lagged state and the network input. Its training is stable and fast due to relating its learning rate to the size of the module.
arXiv Detail & Related papers (2021-05-28T20:24:00Z)
Learning monocular 3D reconstruction of articulated categories from motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss. We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles. We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z)
SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations [42.476537776831314]
We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects. The proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets.
arXiv Detail & Related papers (2020-07-21T17:29:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.