Related papers: Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

URL: http://arxiv.org/abs/2011.13611v3
Date: Thu, 5 Aug 2021 03:33:15 GMT
Title: Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving
Authors: Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, Gao Huang
Abstract summary: We propose a novel frequency domain image translation framework, exploiting frequency information for enhancing the image generation process. Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity. Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images.
Score: 36.606114597585396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-to-image translation has been revolutionized with GAN-based methods. However, existing methods lack the ability to preserve the identity of the source domain. As a result, synthesized images can often over-adapt to the reference domain, losing important structural characteristics and suffering from suboptimal visual quality. To solve these challenges, we propose a novel frequency domain image translation (FDIT) framework, exploiting frequency information for enhancing the image generation process. Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity. Our training objective facilitates the preservation of frequency information in both pixel space and Fourier spectral space. We broadly evaluate FDIT across five large-scale datasets and multiple tasks including image translation and GAN inversion. Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images. FDIT establishes state-of-the-art performance, reducing the average FID score by 5.6% compared to the previous best method.

Related papers

Learning Multi-scale Spatial-frequency Features for Image Denoising [58.883244886588336]
We propose a novel multi-scale adaptive dual-domain network (MADNet) for image denoising.<n>We use image pyramid inputs to restore noise-free results from low-resolution images.<n>In order to realize the interaction of high-frequency and low-frequency information, we design an adaptive spatial-frequency learning unit.
arXiv Detail & Related papers (2025-06-19T13:28:09Z)
Self-Bootstrapping for Versatile Test-Time Adaptation [29.616417768209114]
We develop a versatile test-time adaptation (TTA) objective for a variety of tasks. We achieve this through a self-bootstrapping scheme that optimize prediction consistency between the test image (as target) and its deteriorated view. Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks.
arXiv Detail & Related papers (2025-04-10T05:45:07Z)
NFIG: Autoregressive Image Generation with Next-Frequency Prediction [50.69346038028673]
We present textbfNext-textbfFrequency textbfImage textbfGeneration (textbfNFIG), a novel framework that decomposes the image generation process into multiple frequency-guided stages.<n>Our approach first generates low-frequency components to establish global structure with fewer tokens, then progressively adds higher-frequency details, following the natural spectral hierarchy of images.
arXiv Detail & Related papers (2025-03-10T08:59:10Z)
Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation [27.576174611043367]
Masked Image Modeling (MIM) has garnered significant attention in self-supervised learning, thanks to its impressive capacity to learn scalable visual representations tailored for downstream tasks. However, images inherently contain abundant redundant information, leading the pixel-based MIM reconstruction process to focus excessively on finer details such as textures, thus prolonging training times unnecessarily. In this study, we leverage wavelet transform as a tool for efficient representation learning to expedite the training process of MIM.
arXiv Detail & Related papers (2025-03-02T08:11:26Z)
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery [74.73492472409447]
diffusion models are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details. We propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transformation are considered individually. We show that WaveFace outperforms state-of-the-art methods in authenticity, especially in terms of identity preservation.
arXiv Detail & Related papers (2024-03-19T14:27:24Z)
Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile [15.5188527312094]
We propose a framework to mitigate the disparity in frequency domain of the generated images. This is realized by spectrum translation for the refinement of image generation (STIG) based on contrastive learning. We evaluate our framework across eight fake image datasets and various cutting-edge models to demonstrate the effectiveness of STIG.
arXiv Detail & Related papers (2024-03-08T06:39:24Z)
Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution. We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
A Scale-Arbitrary Image Super-Resolution Network Using Frequency-domain Information [42.55177009667711]
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images. In this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network.
arXiv Detail & Related papers (2022-12-08T15:10:49Z)
Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring [39.720032882926176]
We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring. We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture.
arXiv Detail & Related papers (2022-11-22T13:08:03Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.