Alignment-free Raw Video Demoireing
- URL: http://arxiv.org/abs/2408.10679v3
- Date: Sun, 10 Aug 2025 15:13:03 GMT
- Title: Alignment-free Raw Video Demoireing
- Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou,
- Abstract summary: Video demoireing aims to remove undesirable interference patterns that arise during the capture of screen content.<n>This paper introduces a novel alignment-free raw video demoireing network with frequency-assisted temporal Mamba (DemMamba)<n>It surpasses state-of-the-art methods by 1.6 dB in PSNR, and also delivers a satisfactory visual experience.
- Score: 18.06907326360215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video demoireing aims to remove undesirable interference patterns that arise during the capture of screen content, restoring artifact-free frames while maintaining temporal consistency. Existing video demoireing methods typically utilize carefully designed alignment modules to estimate inter-frame motion for leveraging temporal information; however, these modules are often complex and computationally demanding. Meanwhile, recent works indicate that using raw data as input significantly enhances demoireing performance. Building on this insight, this paper introduces a novel alignment-free raw video demoireing network with frequency-assisted spatio-temporal Mamba (DemMamba). It incorporates sequentially arranged Spatial Mamba Blocks (SMB) and Temporal Mamba Blocks (TMB) to effectively model the inter- and intra-relationships in raw video demoireing. The SMB employs a multi-directional scanning mechanism coupled with a learnable frequency compressor to effectively differentiate interference patterns across various orientations and frequencies, resulting in reduced artifacts, sharper edges, and faithful texture reconstruction. Concurrently, the TMB enhances temporal consistency by performing bidirectional scanning across the temporal sequences and integrating channel attention techniques, facilitating improved temporal information fusion. Extensive experiments demonstrate that DemMamba surpasses state-of-the-art methods by 1.6 dB in PSNR, and also delivers a satisfactory visual experience.
Related papers
- MoCHA-former: Moiré-Conditioned Hybrid Adaptive Transformer for Video Demoiréing [9.869634509510014]
frequency aliasing between the camera's color filter array (CFA) and the display's sub-pixels induces moir'e patterns that severely degrade captured photos and videos.<n>MoCHA-former comprises two key components: Decoupled Moir'e Adaptive Demoir'eing (DMAD) and Spatio-Temporal Adaptive Demoir'eing (STAD)<n>We analyze moir'e characteristics through qualitative and quantitative studies, and evaluate on two video datasets covering RAW and sRGB domains.
arXiv Detail & Related papers (2025-08-20T04:42:07Z) - DiTVR: Zero-Shot Diffusion Transformer for Video Restoration [48.97196894658511]
DiTVR is a zero shot video restoration framework that couples a diffusion transformer with trajectory aware attention and a flow consistent sampler.<n>Our attention mechanism aligns tokens along optical flow trajectories, with particular emphasis on vital layers that exhibit the highest sensitivity to temporal dynamics.<n>The flow guided sampler injects data consistency only into low-frequency bands, preserving high frequency priors while accelerating cache.
arXiv Detail & Related papers (2025-08-11T09:54:45Z) - Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z) - VSRM: A Robust Mamba-Based Framework for Video Super-Resolution [1.8506868409351092]
Video super-resolution remains a major challenge in low-level vision tasks.<n>In this work, we propose VSRM, a novel framework for processing long sequences in video.<n> VSRM achieves state-of-the-art results on diverse benchmarks, establishing itself as a solid foundation for future research.
arXiv Detail & Related papers (2025-06-28T05:51:42Z) - Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z) - FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection [4.015022008487465]
Large-scale pre-trained video encoders tend to introduce background clutter and irrelevant semantics, leading to context confusion and boundaries.<n>We propose a frequency-aware decoupling network that improves action discriminability by filtering out noisy semantics captured by pre-trained models.<n>Our method achieves state-of-the-art performance on temporal action detection benchmarks.
arXiv Detail & Related papers (2025-04-01T10:57:37Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.
Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.
We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - Spatial Degradation-Aware and Temporal Consistent Diffusion Model for Compressed Video Super-Resolution [25.615935776826596]
Due to storage and bandwidth limitations, videos transmitted over the Internet often exhibit low quality, characterized by low-resolution and compression artifacts.<n>Although video super-resolution (VSR) is an efficient video enhancing technique, existing VS methods focus less on compressed videos.<n>We propose a novel method that exploits the priors of pre-trained diffusion models for compressed VSR.
arXiv Detail & Related papers (2025-02-11T08:57:45Z) - MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling [4.736059095502584]
This paper proposes a novel multi-stream approach that integrates complementary pre-processing techniques such as gamma correction and histograms alongside raw dark frames.
Extensive experiments on ARID V1.0 and ARID1.5 dark video datasets show that MD-BERT outperforms existing methods.
arXiv Detail & Related papers (2025-02-06T02:26:47Z) - Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models [64.2445487645478]
Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio.
We present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live streaming video translation.
arXiv Detail & Related papers (2024-07-11T17:34:51Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information.
Inaccurate alignment usually leads to aligned features with significant artifacts.
propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency [9.07931905323022]
We propose a novel text-to-video (T2V) generation network structure based on diffusion models.
Our approach only necessitates a single video as input and builds upon pre-trained stable diffusion networks.
We leverage a hybrid architecture of transformers and convolutions to compensate for temporal intricacies, enhancing consistency between different frames within the video.
arXiv Detail & Related papers (2023-08-24T07:11:00Z) - Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions.
And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z) - Deep Video Matting via Spatio-Temporal Alignment and Aggregation [63.6870051909004]
We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM)
To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced.
Our framework significantly outperforms conventional video matting and deep image matting methods.
arXiv Detail & Related papers (2021-04-22T17:42:08Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.