EfficientSCI: Densely Connected Network with Space-time Factorization
for Large-scale Video Snapshot Compressive Imaging
- URL: http://arxiv.org/abs/2305.10006v2
- Date: Thu, 18 May 2023 05:13:09 GMT
- Title: EfficientSCI: Densely Connected Network with Space-time Factorization
for Large-scale Video Snapshot Compressive Imaging
- Authors: Lishun Wang, Miao Cao, and Xin Yuan
- Abstract summary: We show that an UHD color video with high compression ratio can be reconstructed from a snapshot 2D measurement using a single end-to-end deep learning model with PSNR above 32 dB.
Our method significantly outperforms all previous SOTA algorithms with better real-time performance.
- Score: 6.8372546605486555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video snapshot compressive imaging (SCI) uses a two-dimensional detector to
capture consecutive video frames during a single exposure time. Following this,
an efficient reconstruction algorithm needs to be designed to reconstruct the
desired video frames. Although recent deep learning-based state-of-the-art
(SOTA) reconstruction algorithms have achieved good results in most tasks, they
still face the following challenges due to excessive model complexity and GPU
memory limitations: 1) these models need high computational cost, and 2) they
are usually unable to reconstruct large-scale video frames at high compression
ratios. To address these issues, we develop an efficient network for video SCI
by using dense connections and space-time factorization mechanism within a
single residual block, dubbed EfficientSCI. The EfficientSCI network can well
establish spatial-temporal correlation by using convolution in the spatial
domain and Transformer in the temporal domain, respectively. We are the first
time to show that an UHD color video with high compression ratio can be
reconstructed from a snapshot 2D measurement using a single end-to-end deep
learning model with PSNR above 32 dB. Extensive results on both simulation and
real data show that our method significantly outperforms all previous SOTA
algorithms with better real-time performance. The code is at
https://github.com/ucaswangls/EfficientSCI.git.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Deep Optics for Video Snapshot Compressive Imaging [10.830072985735175]
Video snapshot imaging (SCI) aims to capture a sequence of video frames with only a single shot of a 2D detector.
This paper presents a framework to jointly optimize masks and a reconstruction network.
We believe this is a milestone for real-world video SCI.
arXiv Detail & Related papers (2024-04-08T08:04:44Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
Text-to-Video Generation [115.09597127418452]
Latent-Shift is an efficient text-to-video generation method based on a pretrained text-to-image generation model.
We show that Latent-Shift achieves comparable or better results while being significantly more efficient.
arXiv Detail & Related papers (2023-04-17T17:57:06Z) - ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis [40.249030338644225]
Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in generating a photo-realistic video from a sequence of semantic maps.
Fast-Vid2Vid achieves around real-time performance as 20 FPS and saves around 8x computational cost on a single V100 GPU.
arXiv Detail & Related papers (2022-07-11T17:57:57Z) - Dual-view Snapshot Compressive Imaging via Optical Flow Aided Recurrent
Neural Network [14.796204921975733]
Dual-view snapshot compressive imaging (SCI) aims to capture videos from two field-of-views (FoVs) in a single snapshot.
It is challenging for existing model-based decoding algorithms to reconstruct each individual scene.
We propose an optical flow-aided recurrent neural network for dual video SCI systems, which provides high-quality decoding in seconds.
arXiv Detail & Related papers (2021-09-11T14:24:44Z) - Memory-Efficient Network for Large-scale Video Compressive Sensing [21.040260603729227]
Video snapshot imaging (SCI) captures a sequence of video frames in a single shot using a 2D detector.
In this paper, we develop a memory-efficient network for large-scale video SCI based on multi-group reversible 3D convolutional neural networks.
arXiv Detail & Related papers (2021-03-04T15:14:58Z) - Plug-and-Play Algorithms for Video Snapshot Compressive Imaging [41.818167109996885]
We consider the reconstruction problem of snapshot video imaging (SCI) using a low-speed 2D sensor (detector)
The underlying principle SCI is to modulate frames with different masks and then encoded frames are integrated into a snapshot on the sensor.
Applying SCI to largescale problems (HD or UHD videos) in our daily life is still challenging one bottlenecks lies in the reconstruction algorithm.
arXiv Detail & Related papers (2021-01-13T00:51:49Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.