Frequency Domain Image Translation: More Photo-realistic, Better
Identity-preserving
- URL: http://arxiv.org/abs/2011.13611v3
- Date: Thu, 5 Aug 2021 03:33:15 GMT
- Title: Frequency Domain Image Translation: More Photo-realistic, Better
Identity-preserving
- Authors: Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, Gao Huang
- Abstract summary: We propose a novel frequency domain image translation framework, exploiting frequency information for enhancing the image generation process.
Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity.
Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images.
- Score: 36.606114597585396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-image translation has been revolutionized with GAN-based methods.
However, existing methods lack the ability to preserve the identity of the
source domain. As a result, synthesized images can often over-adapt to the
reference domain, losing important structural characteristics and suffering
from suboptimal visual quality. To solve these challenges, we propose a novel
frequency domain image translation (FDIT) framework, exploiting frequency
information for enhancing the image generation process. Our key idea is to
decompose the image into low-frequency and high-frequency components, where the
high-frequency feature captures object structure akin to the identity. Our
training objective facilitates the preservation of frequency information in
both pixel space and Fourier spectral space. We broadly evaluate FDIT across
five large-scale datasets and multiple tasks including image translation and
GAN inversion. Extensive experiments and ablations show that FDIT effectively
preserves the identity of the source image, and produces photo-realistic
images. FDIT establishes state-of-the-art performance, reducing the average FID
score by 5.6% compared to the previous best method.
Related papers
- Self-Bootstrapping for Versatile Test-Time Adaptation [29.616417768209114]
We develop a versatile test-time adaptation (TTA) objective for a variety of tasks.
We achieve this through a self-bootstrapping scheme that optimize prediction consistency between the test image (as target) and its deteriorated view.
Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks.
arXiv Detail & Related papers (2025-04-10T05:45:07Z) - Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation [27.576174611043367]
Masked Image Modeling (MIM) has garnered significant attention in self-supervised learning, thanks to its impressive capacity to learn scalable visual representations tailored for downstream tasks.
However, images inherently contain abundant redundant information, leading the pixel-based MIM reconstruction process to focus excessively on finer details such as textures, thus prolonging training times unnecessarily.
In this study, we leverage wavelet transform as a tool for efficient representation learning to expedite the training process of MIM.
arXiv Detail & Related papers (2025-03-02T08:11:26Z) - WaveFace: Authentic Face Restoration with Efficient Frequency Recovery [74.73492472409447]
diffusion models are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details.
We propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transformation are considered individually.
We show that WaveFace outperforms state-of-the-art methods in authenticity, especially in terms of identity preservation.
arXiv Detail & Related papers (2024-03-19T14:27:24Z) - Spectrum Translation for Refinement of Image Generation (STIG) Based on
Contrastive Learning and Spectral Filter Profile [15.5188527312094]
We propose a framework to mitigate the disparity in frequency domain of the generated images.
This is realized by spectrum translation for the refinement of image generation (STIG) based on contrastive learning.
We evaluate our framework across eight fake image datasets and various cutting-edge models to demonstrate the effectiveness of STIG.
arXiv Detail & Related papers (2024-03-08T06:39:24Z) - Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - A Scale-Arbitrary Image Super-Resolution Network Using Frequency-domain
Information [42.55177009667711]
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images.
In this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network.
arXiv Detail & Related papers (2022-12-08T15:10:49Z) - Efficient Frequency Domain-based Transformers for High-Quality Image
Deblurring [39.720032882926176]
We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring.
We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture.
arXiv Detail & Related papers (2022-11-22T13:08:03Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.