MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive
Sensing
- URL: http://arxiv.org/abs/2103.01786v1
- Date: Tue, 2 Mar 2021 14:53:00 GMT
- Title: MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive
Sensing
- Authors: Zhengjue Wang and Hao Zhang and Ziheng Cheng and Bo Chen and Xin Yuan
- Abstract summary: Video snapshot compressive imaging (SCI) is a promising system, where the video frames are coded by different masks and then compressed to a snapshot measurement.
We develop a Meta Modulated Convolutional Network for SCI reconstruction, dubbed MetaSCI.
- Score: 21.243762976995544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To capture high-speed videos using a two-dimensional detector, video snapshot
compressive imaging (SCI) is a promising system, where the video frames are
coded by different masks and then compressed to a snapshot measurement.
Following this, efficient algorithms are desired to reconstruct the high-speed
frames, where the state-of-the-art results are achieved by deep learning
networks. However, these networks are usually trained for specific small-scale
masks and often have high demands of training time and GPU memory, which are
hence {\bf \em not flexible} to $i$) a new mask with the same size and $ii$) a
larger-scale mask. We address these challenges by developing a Meta Modulated
Convolutional Network for SCI reconstruction, dubbed MetaSCI. MetaSCI is
composed of a shared backbone for different masks, and light-weight
meta-modulation parameters to evolve to different modulation parameters for
each mask, thus having the properties of {\bf \em fast adaptation} to new masks
(or systems) and ready to {\bf \em scale to large data}. Extensive simulation
and real data results demonstrate the superior performance of our proposed
approach. Our code is available at
{\small\url{https://github.com/xyvirtualgroup/MetaSCI-CVPR2021}}.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Deep Optics for Video Snapshot Compressive Imaging [10.830072985735175]
Video snapshot imaging (SCI) aims to capture a sequence of video frames with only a single shot of a 2D detector.
This paper presents a framework to jointly optimize masks and a reconstruction network.
We believe this is a milestone for real-world video SCI.
arXiv Detail & Related papers (2024-04-08T08:04:44Z) - VideoMAC: Video Masked Autoencoders Meet ConvNets [26.723998063596635]
VideoMAC employs symmetric masking on randomly sampled pairs of video frames.
We present a simple yet effective masked video modeling (MVM) approach, a dual encoder architecture.
VideoMAC, empowering classical (ResNet) / modern (ConvNeXt) convolutional encoders, outperforms ViT-based approaches on downstream tasks.
arXiv Detail & Related papers (2024-02-29T12:09:25Z) - Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - DMDC: Dynamic-mask-based dual camera design for snapshot Hyperspectral
Imaging [3.3946853660795884]
We present a dynamic-mask-based dual camera system, which consists of an RGB camera and a CASSI system running in parallel.
First, the system learns the spatial feature distribution of the scene based on the RGB images, then instructs the SLM to encode each scene, and finally sends both RGB and CASSI images to the network for reconstruction.
We further designed the DMDC-net, which consists of two separate networks, a small-scale CNN-based dynamic mask network for dynamic adjustment of the mask and a multimodal reconstruction network for reconstruction using RGB and CASSI measurements.
arXiv Detail & Related papers (2023-08-03T05:10:58Z) - Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy.
Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression.
Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - ConvMAE: Masked Convolution Meets Masked Autoencoders [65.15953258300958]
Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT.
Our ConvMAE framework demonstrates that multi-scale hybrid convolution-transformer can learn more discriminative representations via the mask auto-encoding scheme.
Based on our pretrained ConvMAE models, ConvMAE-Base improves ImageNet-1K finetuning accuracy by 1.4% compared with MAE-Base.
arXiv Detail & Related papers (2022-05-08T15:12:19Z) - Dual-view Snapshot Compressive Imaging via Optical Flow Aided Recurrent
Neural Network [14.796204921975733]
Dual-view snapshot compressive imaging (SCI) aims to capture videos from two field-of-views (FoVs) in a single snapshot.
It is challenging for existing model-based decoding algorithms to reconstruct each individual scene.
We propose an optical flow-aided recurrent neural network for dual video SCI systems, which provides high-quality decoding in seconds.
arXiv Detail & Related papers (2021-09-11T14:24:44Z) - Memory-Efficient Network for Large-scale Video Compressive Sensing [21.040260603729227]
Video snapshot imaging (SCI) captures a sequence of video frames in a single shot using a 2D detector.
In this paper, we develop a memory-efficient network for large-scale video SCI based on multi-group reversible 3D convolutional neural networks.
arXiv Detail & Related papers (2021-03-04T15:14:58Z) - DCT-Mask: Discrete Cosine Transform Mask Representation for Instance
Segmentation [50.70679435176346]
We propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector.
Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods.
arXiv Detail & Related papers (2020-11-19T15:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.