Related papers: FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment

FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment

URL: http://arxiv.org/abs/2301.07330v2
Date: Mon, 19 Jun 2023 16:10:19 GMT
Title: FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment
Authors: Gyeongrok Oh, Heon Gu, Jinkyu Kim, Sangpil Kim
Abstract summary: We propose a novel model called FPANet that learns filters in both frequency and spatial domains. We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset.
Score: 6.507353572917133
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Interference between overlapping gird patterns creates moire patterns, degrading the visual quality of an image that captures a screen of a digital display device by an ordinary digital camera. Removing such moire patterns is challenging due to their complex patterns of diverse sizes and color distortions. Existing approaches mainly focus on filtering out in the spatial domain, failing to remove a large-scale moire pattern. In this paper, we propose a novel model called FPANet that learns filters in both frequency and spatial domains, improving the restoration quality by removing various sizes of moire patterns. To further enhance, our model takes multiple consecutive frames, learning to extract frame-invariant content features and outputting better quality temporally consistent images. We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset, observing that ours outperforms the state-of-the-art approaches, including ESDNet, VDmoire, MBCNN, WDNet, UNet, and DMCNN, in terms of the image and video quality metrics, such as PSNR, SSIM, LPIPS, FVD, and FSIM.

Related papers

Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model [92.61216319417208]
We propose a novel diffusion model (DM)-based framework, dubbed ours, for image deblurring.<n>ours performs DM to generate the prior knowledge that aids in recovering the textures of blurry images.<n>To fully exploit the generated texture priors, we present the Texture Transfer Transformer layer (TTformer)
arXiv Detail & Related papers (2025-07-18T01:50:31Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures. We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z)
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.1696421797495086]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax. We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model. We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z)
Multimodal Instruction Tuning with Hybrid State Space Models [25.921044010033267]
Long context is crucial for enhancing the recognition and understanding capabilities of multimodal large language models. We propose a novel approach using a hybrid transformer-MAMBA model to efficiently handle long contexts in multimodal applications. Our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models.
arXiv Detail & Related papers (2024-11-13T18:19:51Z)
SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM [11.447968918063335]
Image inpainting aims to repair a partially damaged image based on the information from known regions of the images. SEM-Net is a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space.
arXiv Detail & Related papers (2024-11-10T00:35:14Z)
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information. We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z)
Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation. Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model. Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z)
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise. MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains. Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
ShapeMoiré: Channel-Wise Shape-Guided Network for Image Demoiréing [19.56605254816149]
Photographing optoelectronic displays often introduces unwanted moir'e patterns due to analog signal interference. This work identifies two problems that are largely ignored by existing image demoir'eing approaches. We propose a ShapeMoir'e method to aid in image demoir'eing.
arXiv Detail & Related papers (2024-04-28T12:12:08Z)
AADNet: Attention aware Demoiréing Network [2.1626093085892144]
Moire pattern frequently appears in photographs captured with mobile devices and digital cameras. We propose a novel lightweight architecture, AADNet, for high-resolution image demoire'ing.
arXiv Detail & Related papers (2024-03-13T09:48:11Z)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos. Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z)
DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)
Restoration of Video Frames from a Single Blurred Image with Motion Understanding [69.90724075337194]
We propose a novel framework to generate clean video frames from a single motion-red image. We formulate video restoration from a single blurred image as an inverse problem by setting clean image sequence and their respective motion as latent factors. Our framework is based on anblur-decoder structure with spatial transformer network modules.
arXiv Detail & Related papers (2021-04-19T08:32:57Z)
ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space. Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z)
Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
Wavelet-Based Dual-Branch Network for Image Demoireing [148.91145614517015]
We design a wavelet-based dual-branch network (WDNet) with a spatial attention mechanism for image demoireing. Our network removes moire patterns in the wavelet domain to separate the frequencies of moire patterns from the image content. Experiments demonstrate the effectiveness of our method, and we further show that WDNet generalizes to removing moire artifacts on non-screen images.
arXiv Detail & Related papers (2020-07-14T16:44:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.