DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba
- URL: http://arxiv.org/abs/2408.10679v2
- Date: Mon, 18 Nov 2024 07:46:35 GMT
- Title: DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba
- Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou,
- Abstract summary: Moire patterns, resulting from the interference of two similar repetitive patterns, are frequently observed during the capture of images or videos on screens.
This paper introduces a novel alignment-free raw video demoireing network with frequency-assisted-temporal Mamba.
Our proposed DemMamba surpasses state-of-the-art methods by 1.3 dB in PSNR, and also provides a satisfactory visual experience.
- Score: 18.06907326360215
- License:
- Abstract: Moire patterns, resulting from the interference of two similar repetitive patterns, are frequently observed during the capture of images or videos on screens. These patterns vary in color, shape, and location across video frames, posing challenges in extracting information from adjacent frames and preserving temporal consistency throughout the restoration process. Existing deep learning methods often depend on well-designed alignment modules, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, incurring high computational costs. Recent studies indicate that utilizing raw data as input can significantly improve the effectiveness of video demoireing by providing the pristine degradation information and more detailed content. However, previous works fail to design both efficient and effective raw video demoireing methods that can maintain temporal consistency and prevent degradation of color and spatial details. This paper introduces a novel alignment-free raw video demoireing network with frequency-assisted spatio-temporal Mamba (DemMamba). It features sequentially arranged Spatial Mamba Blocks (SMB) and Temporal Mamba Blocks (TMB) to effectively model the inter- and intra-relationships in raw videos affected by moire patterns. An Adaptive Frequency Block (AFB) within the SMB facilitates demoireing in the frequency domain, while a Channel Attention Block (CAB) in the TMB enhances the temporal information interactions by leveraging inter-channel relationships among features. Extensive experiments demonstrate that our proposed DemMamba surpasses state-of-the-art methods by 1.3 dB in PSNR, and also provides a satisfactory visual experience.
Related papers
- Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models [64.2445487645478]
Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio.
We present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live streaming video translation.
arXiv Detail & Related papers (2024-07-11T17:34:51Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information.
Inaccurate alignment usually leads to aligned features with significant artifacts.
propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency [9.07931905323022]
We propose a novel text-to-video (T2V) generation network structure based on diffusion models.
Our approach only necessitates a single video as input and builds upon pre-trained stable diffusion networks.
We leverage a hybrid architecture of transformers and convolutions to compensate for temporal intricacies, enhancing consistency between different frames within the video.
arXiv Detail & Related papers (2023-08-24T07:11:00Z) - Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions.
And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z) - Deep Video Matting via Spatio-Temporal Alignment and Aggregation [63.6870051909004]
We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM)
To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced.
Our framework significantly outperforms conventional video matting and deep image matting methods.
arXiv Detail & Related papers (2021-04-22T17:42:08Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.