Related papers: Long-Term Multi-Session 3D Reconstruction Under Substantial Appearance Change

Long-Term Multi-Session 3D Reconstruction Under Substantial Appearance Change

URL: http://arxiv.org/abs/2602.20584v1
Date: Tue, 24 Feb 2026 06:12:51 GMT
Title: Long-Term Multi-Session 3D Reconstruction Under Substantial Appearance Change
Authors: Beverley Gorry, Tobias Fischer, Michael Milford, Alejandro Fontan,
Abstract summary: Long-term environmental monitoring requires the ability to reconstruct and align 3D models across repeated site visits separated by months or years.<n>Existing approaches rely on post-hoc alignment of independently reconstructed sessions.<n>We propose enforcing cross-session correspondences directly within a joint SfM reconstruction.
Score: 52.46888249268445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-term environmental monitoring requires the ability to reconstruct and align 3D models across repeated site visits separated by months or years. However, existing Structure-from-Motion (SfM) pipelines implicitly assume near-simultaneous image capture and limited appearance change, and therefore fail when applied to long-term monitoring scenarios such as coral reef surveys, where substantial visual and structural change is common. In this paper, we show that the primary limitation of current approaches lies in their reliance on post-hoc alignment of independently reconstructed sessions, which is insufficient under large temporal appearance change. We address this limitation by enforcing cross-session correspondences directly within a joint SfM reconstruction. Our approach combines complementary handcrafted and learned visual features to robustly establish correspondences across large temporal gaps, enabling the reconstruction of a single coherent 3D model from imagery captured years apart, where standard independent and joint SfM pipelines break down. We evaluate our method on long-term coral reef datasets exhibiting significant real-world change, and demonstrate consistent joint reconstruction across sessions in cases where existing methods fail to produce coherent reconstructions. To ensure scalability to large datasets, we further restrict expensive learned feature matching to a small set of likely cross-session image pairs identified via visual place recognition, which reduces computational cost and improves alignment robustness.

Related papers

OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution [34.8105632078785]
We introduce OnlineX, a feed-forward framework that reconstructs both 3D visual appearance and language fields in an online manner using only streaming images.<n>Our framework decouples the memory state into a dedicated active state and a persistent stable state, and then cohesively fuses the information from the former into the latter to achieve both fidelity and stability.
arXiv Detail & Related papers (2026-03-02T17:52:02Z)
A Decomposition-based State Space Model for Multivariate Time-Series Forecasting [0.0]
We propose an end-to-end decomposition framework using three parallel deep state space model branches to capture trend, seasonal, and residual components.<n>Across standard benchmarks, DecompSSM outperformed strong baselines, indicating the effectiveness of combining component-wise deep state space models and global context refinement.
arXiv Detail & Related papers (2026-02-05T07:17:08Z)
InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting [64.42884719282323]
InpaintHuman is a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos.<n>Our approach employs direct pixel-level supervision to ensure identity fidelity.
arXiv Detail & Related papers (2026-01-05T13:26:02Z)
Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection [51.56484100374058]
We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks.<n>A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences.<n>Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection.
arXiv Detail & Related papers (2025-11-11T08:40:28Z)
VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal [2.8954284913103367]
Novel View Synthesis (NVS) and 3D generation have significantly improved editing tasks.<n>To maintain cross-view consistency throughout the generative process, methods typically address this challenge using a dual-strategy framework.<n>We present VEIGAR, a computationally efficient framework that outperforms existing methods without relying on an initial reconstruction phase.
arXiv Detail & Related papers (2025-06-13T11:31:44Z)
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models [52.92249035412797]
We introduce StateSpaceDiffuser, where a diffusion model is enabled to perform long-context tasks by integrating features from a state-space model.<n>This design restores long-term memory while preserving the high-fidelity synthesis of diffusion models.
arXiv Detail & Related papers (2025-05-28T11:27:54Z)
TransformerLSR: Attentive Joint Model of Longitudinal Data, Survival, and Recurrent Events with Concurrent Latent Structure [35.54001561725239]
We develop TransformerLSR, a flexible transformer-based deep modeling and inference framework to jointly model all three components simultaneously. We demonstrate the effectiveness and necessity of TransformerLSR through simulation studies and analyzing a real-world medical dataset on patients after kidney transplantation.
arXiv Detail & Related papers (2024-04-04T20:51:37Z)
REGAS: REspiratory-GAted Synthesis of Views for Multi-Phase CBCT Reconstruction from a single 3D CBCT Acquisition [75.64791080418162]
REGAS proposes a self-supervised method to synthesize the undersampled tomographic views and mitigate aliasing artifacts in reconstructed images. To address the large memory cost of deep neural networks on high resolution 4D data, REGAS introduces a novel Ray Path Transformation (RPT) that allows for distributed, differentiable forward projections.
arXiv Detail & Related papers (2022-08-17T03:42:19Z)
Non-local Recurrent Regularization Networks for Multi-view Stereo [108.17325696835542]
In deep multi-view stereo networks, cost regularization is crucial to achieve accurate depth estimation. We propose a novel non-local recurrent regularization network for multi-view stereo, named NR2-Net. Our method achieves state-of-the-art reconstruction results on both DTU and Tanks and Temples datasets.
arXiv Detail & Related papers (2021-10-13T01:43:54Z)
VIO-Aided Structure from Motion Under Challenging Environments [12.111638631118026]
We present a robust and efficient Structure from Motion pipeline for accurate 3D reconstruction under challenging environments. Specifically, we propose a geometric verification method to filter out mismatches by considering the prior geometric configuration of candidate image pairs.
arXiv Detail & Related papers (2021-01-24T06:35:52Z)
Supporting Optimal Phase Space Reconstructions Using Neural Network Architecture for Time Series Modeling [68.8204255655161]
We propose an artificial neural network with a mechanism to implicitly learn the phase spaces properties. Our approach is either as competitive as or better than most state-of-the-art strategies.
arXiv Detail & Related papers (2020-06-19T21:04:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.