Unfolding Framework with Prior of Convolution-Transformer Mixture and
Uncertainty Estimation for Video Snapshot Compressive Imaging
- URL: http://arxiv.org/abs/2306.11316v1
- Date: Tue, 20 Jun 2023 06:25:48 GMT
- Title: Unfolding Framework with Prior of Convolution-Transformer Mixture and
Uncertainty Estimation for Video Snapshot Compressive Imaging
- Authors: Siming Zheng and Xin Yuan
- Abstract summary: We consider the problem of video snapshot compressive imaging (SCI), where sequential high-speed frames are modulated by different masks and captured by a single measurement.
By combining optimization algorithms and neural networks, deep unfolding networks (DUNs) score tremendous achievements in solving inverse problems.
- Score: 7.601695814245209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of video snapshot compressive imaging (SCI), where
sequential high-speed frames are modulated by different masks and captured by a
single measurement. The underlying principle of reconstructing multi-frame
images from only one single measurement is to solve an ill-posed problem. By
combining optimization algorithms and neural networks, deep unfolding networks
(DUNs) score tremendous achievements in solving inverse problems. In this
paper, our proposed model is under the DUN framework and we propose a 3D
Convolution-Transformer Mixture (CTM) module with a 3D efficient and scalable
attention model plugged in, which helps fully learn the correlation between
temporal and spatial dimensions by virtue of Transformer. To our best
knowledge, this is the first time that Transformer is employed to video SCI
reconstruction. Besides, to further investigate the high-frequency information
during the reconstruction process which are neglected in previous studies, we
introduce variance estimation characterizing the uncertainty on a
pixel-by-pixel basis. Extensive experimental results demonstrate that our
proposed method achieves state-of-the-art (SOTA) (with a 1.2dB gain in PSNR
over previous SOTA algorithm) results. We will release the code.
Related papers
- Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging [8.819370643243012]
Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique for capturing three-dimensional multispectral images (MSIs)
Current state-of-the-art methods, predominantly end-to-end, face limitations in reconstructing high-frequency details.
This paper introduces a novel one-step Diffusion Probabilistic Model within a self-supervised adaptation framework for Snapshot Compressive Imaging.
arXiv Detail & Related papers (2024-09-11T17:02:10Z) - Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction [15.537910100051866]
We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI)
We propose Coarse-Fine Spectral-Aware Deformable Convolution Network (CFSDCN)
Our CFSDCN significantly outperforms previous state-of-the-art (SOTA) methods on both simulated and real HSI datasets.
arXiv Detail & Related papers (2024-06-18T15:15:12Z) - Plug-and-Play Regularization on Magnitude with Deep Priors for 3D Near-Field MIMO Imaging [0.0]
Near-field radar imaging systems are used in a wide range of applications such as concealed weapon detection and medical diagnosis.
We consider the problem of the three-dimensional (3D) complex-valued reflectivity by enforcing regularization on its magnitude.
arXiv Detail & Related papers (2023-12-26T12:25:09Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral
Compressive Imaging [142.11622043078867]
We propose a principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration.
By plugging HST into DAUF, we establish the first Transformer-based deep unfolding method, Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST) for HSI reconstruction.
arXiv Detail & Related papers (2022-05-20T11:37:44Z) - Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST)
CST embedding HSI sparsity into deep learning for HSI reconstruction.
In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation [18.14237514372724]
We propose a new framework to generate 3D human pose and mesh from RGB videos.
We train a two-stream temporal network based on transformer to predict SMPL parameters.
The proposed algorithm is extensively evaluated on the Human3.6 and 3DPW datasets.
arXiv Detail & Related papers (2021-10-22T10:01:13Z) - Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive
Imaging [6.289143409131908]
Snapshot imaging (SCI) aims to record three-dimensional signals via a two-dimensional camera.
We present a novel dense deep unfolding network (DUN) with 3D-CNN prior for SCI.
In order to promote network adaption, we propose a dense feature map compressive (DFMA) module.
arXiv Detail & Related papers (2021-09-14T09:42:42Z) - Learning a Model-Driven Variational Network for Deformable Image
Registration [89.9830129923847]
VR-Net is a novel cascaded variational network for unsupervised deformable image registration.
It outperforms state-of-the-art deep learning methods on registration accuracy.
It maintains the fast inference speed of deep learning and the data-efficiency of variational model.
arXiv Detail & Related papers (2021-05-25T21:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.