Related papers: LONG3R: Long Sequence Streaming 3D Reconstruction

LONG3R: Long Sequence Streaming 3D Reconstruction

URL: http://arxiv.org/abs/2507.18255v1
Date: Thu, 24 Jul 2025 09:55:20 GMT
Title: LONG3R: Long Sequence Streaming 3D Reconstruction
Authors: Zhuoguang Chen, Minghui Qin, Tianyuan Yuan, Zhe Liu, Hang Zhao,
Abstract summary: Long3R is a novel model designed for streaming multi-view 3D scene reconstruction over longer sequences.<n>Our model achieves real-time processing by operating recurrently, maintaining and updating memory with each new observation.<n>Experiments demonstrate that LONG3R outperforms state-of-the-art streaming methods, particularly for longer sequences.
Score: 29.79885827038617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in multi-view scene reconstruction have been significant, yet existing methods face limitations when processing streams of input images. These methods either rely on time-consuming offline optimization or are restricted to shorter sequences, hindering their applicability in real-time scenarios. In this work, we propose LONG3R (LOng sequence streaming 3D Reconstruction), a novel model designed for streaming multi-view 3D scene reconstruction over longer sequences. Our model achieves real-time processing by operating recurrently, maintaining and updating memory with each new observation. We first employ a memory gating mechanism to filter relevant memory, which, together with a new observation, is fed into a dual-source refined decoder for coarse-to-fine interaction. To effectively capture long-sequence memory, we propose a 3D spatio-temporal memory that dynamically prunes redundant spatial information while adaptively adjusting resolution along the scene. To enhance our model's performance on long sequences while maintaining training efficiency, we employ a two-stage curriculum training strategy, each stage targeting specific capabilities. Experiments demonstrate that LONG3R outperforms state-of-the-art streaming methods, particularly for longer sequences, while maintaining real-time inference speed. Project page: https://zgchen33.github.io/LONG3R/.

Related papers

RELIC: Interactive Video World Model with Long-Horizon Memory [74.81433479334821]
A truly interactive world model requires real-time long-horizon streaming, consistent spatial memory, and precise user control.<n>We present RELIC, a unified framework that tackles these three challenges altogether.<n>Given a single image and a text description, RELIC enables memory-aware, long-duration exploration of arbitrary scenes in real time.
arXiv Detail & Related papers (2025-12-03T18:29:20Z)
Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update [17.581193784542357]
Updating 3D scenes from sparse-view observations is crucial for various real-world applications.<n>We propose Cross-Temporal 3D Gaussian Splatting (Cross-Temporal 3DGS), a novel framework for efficiently reconstructing and updating 3D scenes.<n> Experimental results show significant improvements over baseline methods in reconstruction quality and data efficiency.
arXiv Detail & Related papers (2025-11-29T16:00:24Z)
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer [72.88105562624838]
We present STream3R, a novel approach to 3D reconstruction that reformulates pointmap prediction as a decoder-only Transformer problem.<n>By learning geometric priors from large-scale 3D datasets, STream3R generalizes well to diverse and challenging scenarios.<n>Our results underscore the potential of causal Transformer models for online 3D perception, paving the way for real-time 3D understanding in streaming environments.
arXiv Detail & Related papers (2025-08-14T17:58:05Z)
mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling [0.5236468296934584]
mGRADE is a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit.<n>We demonstrate that mGRADE effectively separates and preserves multi-scale temporal features.<n>This highlights mGRADE's promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.
arXiv Detail & Related papers (2025-07-02T15:44:35Z)
Long-Sequence Memory with Temporal Kernels and Dense Hopfield Functionals [0.0]
Building upon earlier work on long-sequence Hopfield memory models, we propose a temporal kernal $K(m, k)$ to incorporate temporal dependencies.<n>We demonstrate the successful application of this technique for the storage and sequential retrieval of movies frames.
arXiv Detail & Related papers (2025-06-27T15:57:58Z)
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering [68.93333348474988]
We present a novel level-of-detail (LOD) method for 3D Gaussian Splatting on memory-constrained devices.<n>Our approach iteratively selects optimal subsets of Gaussians based on camera distance.<n>Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets.
arXiv Detail & Related papers (2025-05-29T06:50:57Z)
Exploiting Temporal State Space Sharing for Video Semantic Segmentation [53.8810901249897]
Video semantic segmentation (VSS) plays a vital role in understanding the temporal evolution of scenes.<n>Traditional methods often segment videos frame-by-frame or in a short temporal window, leading to limited temporal context, redundant computations, and heavy memory requirements.<n>We introduce a Temporal Video State Space Sharing architecture to leverage Mamba state space models for temporal feature sharing.<n>Our model features a selective gating mechanism that efficiently propagates relevant information across video frames, eliminating the need for a memory-heavy feature pool.
arXiv Detail & Related papers (2025-03-26T01:47:42Z)
VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment [54.66217340264935]
VideoLifter is a novel video-to-3D pipeline that leverages a local-to-global strategy on a fragment basis.<n>It significantly accelerates the reconstruction process, reducing training time by over 82% while holding better visual quality than current SOTA methods.
arXiv Detail & Related papers (2025-01-03T18:52:36Z)
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD [27.472705540825316]
This paper is on long-term video understanding where the goal is to recognise human actions over long temporal windows (up to minutes long) We propose an alternative to attention-based schemes which is based on a low-rank approximation of the memory obtained using Singular Value Decomposition. Our scheme has two advantages: (a) it reduces complexity by more than an order of magnitude, and (b) it is amenable to an efficient implementation for the calculation of the memory bases.
arXiv Detail & Related papers (2024-06-11T12:03:57Z)
Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream. At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank. To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.