Frequency Domain Image Translation: More Photo-realistic, Better
Identity-preserving
- URL: http://arxiv.org/abs/2011.13611v3
- Date: Thu, 5 Aug 2021 03:33:15 GMT
- Title: Frequency Domain Image Translation: More Photo-realistic, Better
Identity-preserving
- Authors: Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, Gao Huang
- Abstract summary: We propose a novel frequency domain image translation framework, exploiting frequency information for enhancing the image generation process.
Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity.
Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images.
- Score: 36.606114597585396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-image translation has been revolutionized with GAN-based methods.
However, existing methods lack the ability to preserve the identity of the
source domain. As a result, synthesized images can often over-adapt to the
reference domain, losing important structural characteristics and suffering
from suboptimal visual quality. To solve these challenges, we propose a novel
frequency domain image translation (FDIT) framework, exploiting frequency
information for enhancing the image generation process. Our key idea is to
decompose the image into low-frequency and high-frequency components, where the
high-frequency feature captures object structure akin to the identity. Our
training objective facilitates the preservation of frequency information in
both pixel space and Fourier spectral space. We broadly evaluate FDIT across
five large-scale datasets and multiple tasks including image translation and
GAN inversion. Extensive experiments and ablations show that FDIT effectively
preserves the identity of the source image, and produces photo-realistic
images. FDIT establishes state-of-the-art performance, reducing the average FID
score by 5.6% compared to the previous best method.
Related papers
- WaveFace: Authentic Face Restoration with Efficient Frequency Recovery [74.73492472409447]
diffusion models are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details.
We propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transformation are considered individually.
We show that WaveFace outperforms state-of-the-art methods in authenticity, especially in terms of identity preservation.
arXiv Detail & Related papers (2024-03-19T14:27:24Z) - Spectrum Translation for Refinement of Image Generation (STIG) Based on
Contrastive Learning and Spectral Filter Profile [15.5188527312094]
We propose a framework to mitigate the disparity in frequency domain of the generated images.
This is realized by spectrum translation for the refinement of image generation (STIG) based on contrastive learning.
We evaluate our framework across eight fake image datasets and various cutting-edge models to demonstrate the effectiveness of STIG.
arXiv Detail & Related papers (2024-03-08T06:39:24Z) - Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - A Scale-Arbitrary Image Super-Resolution Network Using Frequency-domain
Information [42.55177009667711]
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images.
In this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network.
arXiv Detail & Related papers (2022-12-08T15:10:49Z) - Efficient Frequency Domain-based Transformers for High-Quality Image
Deblurring [39.720032882926176]
We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring.
We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture.
arXiv Detail & Related papers (2022-11-22T13:08:03Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.