FPANet: Frequency-based Video Demoireing using Frame-level Post
Alignment
- URL: http://arxiv.org/abs/2301.07330v2
- Date: Mon, 19 Jun 2023 16:10:19 GMT
- Title: FPANet: Frequency-based Video Demoireing using Frame-level Post
Alignment
- Authors: Gyeongrok Oh, Heon Gu, Jinkyu Kim, Sangpil Kim
- Abstract summary: We propose a novel model called FPANet that learns filters in both frequency and spatial domains.
We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset.
- Score: 6.507353572917133
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Interference between overlapping gird patterns creates moire patterns,
degrading the visual quality of an image that captures a screen of a digital
display device by an ordinary digital camera. Removing such moire patterns is
challenging due to their complex patterns of diverse sizes and color
distortions. Existing approaches mainly focus on filtering out in the spatial
domain, failing to remove a large-scale moire pattern. In this paper, we
propose a novel model called FPANet that learns filters in both frequency and
spatial domains, improving the restoration quality by removing various sizes of
moire patterns. To further enhance, our model takes multiple consecutive
frames, learning to extract frame-invariant content features and outputting
better quality temporally consistent images. We demonstrate the effectiveness
of our proposed method with a publicly available large-scale dataset, observing
that ours outperforms the state-of-the-art approaches, including ESDNet,
VDmoire, MBCNN, WDNet, UNet, and DMCNN, in terms of the image and video quality
metrics, such as PSNR, SSIM, LPIPS, FVD, and FSIM.
Related papers
- DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures.
We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z) - Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.17975553762582286]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax.
We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model.
We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z) - Multimodal Instruction Tuning with Hybrid State Space Models [25.921044010033267]
Long context is crucial for enhancing the recognition and understanding capabilities of multimodal large language models.
We propose a novel approach using a hybrid transformer-MAMBA model to efficiently handle long contexts in multimodal applications.
Our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models.
arXiv Detail & Related papers (2024-11-13T18:19:51Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Restoration of Video Frames from a Single Blurred Image with Motion
Understanding [69.90724075337194]
We propose a novel framework to generate clean video frames from a single motion-red image.
We formulate video restoration from a single blurred image as an inverse problem by setting clean image sequence and their respective motion as latent factors.
Our framework is based on anblur-decoder structure with spatial transformer network modules.
arXiv Detail & Related papers (2021-04-19T08:32:57Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z) - Wavelet-Based Dual-Branch Network for Image Demoireing [148.91145614517015]
We design a wavelet-based dual-branch network (WDNet) with a spatial attention mechanism for image demoireing.
Our network removes moire patterns in the wavelet domain to separate the frequencies of moire patterns from the image content.
Experiments demonstrate the effectiveness of our method, and we further show that WDNet generalizes to removing moire artifacts on non-screen images.
arXiv Detail & Related papers (2020-07-14T16:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.