Related papers: 3D Reconstruction from Transient Measurements with Time-Resolved Transformer

3D Reconstruction from Transient Measurements with Time-Resolved Transformer

URL: http://arxiv.org/abs/2510.09205v1
Date: Fri, 10 Oct 2025 09:44:08 GMT
Title: 3D Reconstruction from Transient Measurements with Time-Resolved Transformer
Authors: Yue Li, Shida Sun, Yu Hong, Feihu Xu, Zhiwei Xiong,
Abstract summary: We propose a generic Time-Resolved Transformer (TRT) architecture to boost 3D reconstruction performance in photon-efficient imaging.<n>In this paper, we develop two task-specific embodiments: TRT-LOS for imaging and TRT-NLOS for NLOS imaging.<n>In addition, we contribute a large-scale, high-resolution synthetic LOS dataset with various noise levels and capture a set of real-world NLOS imaging measurements.
Score: 48.73999376279579
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Transient measurements, captured by the timeresolved systems, are widely employed in photon-efficient reconstruction tasks, including line-of-sight (LOS) and non-line-of-sight (NLOS) imaging. However, challenges persist in their 3D reconstruction due to the low quantum efficiency of sensors and the high noise levels, particularly for long-range or complex scenes. To boost the 3D reconstruction performance in photon-efficient imaging, we propose a generic Time-Resolved Transformer (TRT) architecture. Different from existing transformers designed for high-dimensional data, TRT has two elaborate attention designs tailored for the spatio-temporal transient measurements. Specifically, the spatio-temporal self-attention encoders explore both local and global correlations within transient data by splitting or downsampling input features into different scales. Then, the spatio-temporal cross attention decoders integrate the local and global features in the token space, resulting in deep features with high representation capabilities. Building on TRT, we develop two task-specific embodiments: TRT-LOS for LOS imaging and TRT-NLOS for NLOS imaging. Extensive experiments demonstrate that both embodiments significantly outperform existing methods on synthetic data and real-world data captured by different imaging systems. In addition, we contribute a large-scale, high-resolution synthetic LOS dataset with various noise levels and capture a set of real-world NLOS measurements using a custom-built imaging system, enhancing the data diversity in this field. Code and datasets are available at https://github.com/Depth2World/TRT.

Related papers

Multispectral-NeRF:a multispectral modeling approach based on neural radiance fields [3.606065291262699]
3D reconstruction techniques based on 2D images typically rely on RGB spectral information.<n>Additional spectral bands beyond RGB have been increasingly incorporated into 3D reconstruction.<n>Existing methods that integrate these expanded spectral data often suffer from expensive scheme prices, low accuracy and poor geometric features.<n>We propose Multispectral-NeRF, an enhanced neural architecture derived from NeRF that can effectively integrate multispectral information.
arXiv Detail & Related papers (2025-09-14T09:04:35Z)
Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z)
Optimized Sampling for Non-Line-of-Sight Imaging Using Modified Fast Fourier Transforms [6.866110149269]
Non-line-of-Sight (NLOS) imaging systems collect light at a diffuse relay surface and input this measurement into computational algorithms that output a 3D reconstruction.<n>These algorithms utilize the Fast Fourier Transform (FFT) to accelerate the reconstruction process but require both input and output to be sampled spatially with uniform grids.<n>In this work, we demonstrate that existing NLOS imaging setups typically oversample the relay surface spatially, explaining why the measurement can be compressed without sacrificing reconstruction quality.
arXiv Detail & Related papers (2025-01-09T13:52:30Z)
Learning Radiance Fields from a Single Snapshot Compressive Image [18.548244681485922]
Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene structure from a single temporal compressed image.<n>We propose SCINeRF, in which we formulate the physical imaging process of SCI as part of the training of NeRF.<n>We further integrate the popular 3D Gaussian Splatting (3DGS) framework and propose SCISplat to improve 3D scene reconstruction quality and training/rendering speed.
arXiv Detail & Related papers (2024-12-27T06:40:44Z)
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.<n>We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
Deep Learning-based Cross-modal Reconstruction of Vehicle Target from Sparse 3D SAR Image [6.499547636078961]
We introduce cross-modal learning and propose a Cross-Modal 3D-SAR Reconstruction Network (CMAR-Net) for enhancing sparse 3D SAR images of vehicle targets by fusing optical information.<n>CMAR-Net achieves efficient training and reconstructs sparse 3D SAR images, which are derived from highly sparse-aspect observations, into visually structured 3D vehicle images.
arXiv Detail & Related papers (2024-06-06T15:18:59Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack [11.433602615992516]
We present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder.<n>By incorporating the LSTM, FocDepthFormer can be pre-trained on large-scale monocular RGB depth estimation datasets.<n>Our model outperforms state-of-the-art approaches across multiple evaluation metrics.
arXiv Detail & Related papers (2023-10-17T11:53:32Z)
A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas [10.477070348391079]
We show that light-weight neural networks can be trained to perform the tomographic inversion with a single feed-forward pass. We train our encoder-decoder network using simulated data and validate our technique on real L-band and P-band data.
arXiv Detail & Related papers (2023-01-20T14:34:03Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.