Related papers: WINE: Wavelet-Guided GAN Inversion and Editing for High-Fidelity Refinement

WINE: Wavelet-Guided GAN Inversion and Editing for High-Fidelity Refinement

URL: http://arxiv.org/abs/2210.09655v2
Date: Tue, 14 Jan 2025 14:22:05 GMT
Title: WINE: Wavelet-Guided GAN Inversion and Editing for High-Fidelity Refinement
Authors: Chaewon Kim, Seung-Jun Moon, Gyeong-Moon Park,
Abstract summary: WINE is a Wavelet-guided GAN Inversion aNd Editing model, which transfers the high-frequency information through wavelet coefficients.<n>We show WINE outperforms existing state-of-the-art GAN inversion models with a fine balance between editability and reconstruction quality.
Score: 9.517232831394459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advanced GAN inversion models aim to convey high-fidelity information from original images to generators through methods using generator tuning or high-dimensional feature learning. Despite these efforts, accurately reconstructing image-specific details remains as a challenge due to the inherent limitations both in terms of training and structural aspects, leading to a bias towards low-frequency information. In this paper, we look into the widely used pixel loss in GAN inversion, revealing its predominant focus on the reconstruction of low-frequency features. We then propose WINE, a Wavelet-guided GAN Inversion aNd Editing model, which transfers the high-frequency information through wavelet coefficients via newly proposed wavelet loss and wavelet fusion scheme. Notably, WINE is the first attempt to interpret GAN inversion in the frequency domain. Our experimental results showcase the precision of WINE in preserving high-frequency details and enhancing image quality. Even in editing scenarios, WINE outperforms existing state-of-the-art GAN inversion models with a fine balance between editability and reconstruction quality.

Related papers

HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling [1.9474278832087901]
HiWave is a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis.<n>A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons.
arXiv Detail & Related papers (2025-06-25T13:58:37Z)
Frequency-Domain Fusion Transformer for Image Inpainting [6.4194162137514725]
This paper proposes a Transformer-based image inpainting method incorporating frequency-domain fusion.<n> Experimental results demonstrate that the proposed method effectively improves the quality of image inpainting by preserving more high-frequency information.
arXiv Detail & Related papers (2025-06-23T09:19:04Z)
Learning Multi-scale Spatial-frequency Features for Image Denoising [58.883244886588336]
We propose a novel multi-scale adaptive dual-domain network (MADNet) for image denoising.<n>We use image pyramid inputs to restore noise-free results from low-resolution images.<n>In order to realize the interaction of high-frequency and low-frequency information, we design an adaptive spatial-frequency learning unit.
arXiv Detail & Related papers (2025-06-19T13:28:09Z)
Wavelet-based Variational Autoencoders for High-Resolution Image Generation [0.0]
Variational Autoencoders (VAEs) are powerful generative models capable of learning compact latent representations. In this paper, we explore a novel wavelet-based approach (Wavelet-VAE) in which the latent space is constructed using multi-scale Haar wavelet coefficients.
arXiv Detail & Related papers (2025-04-16T13:51:41Z)
SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z)
Identity-Preserving Text-to-Video Generation by Frequency Decomposition [52.19475797580653]
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with consistent human identity. This paper pushes the technical frontier of IPT2V in two directions that have not been resolved in literature. We propose ConsisID, a tuning-free DiT-based controllable IPT2V model to keep human identity consistent in the generated video.
arXiv Detail & Related papers (2024-11-26T13:58:24Z)
Local Implicit Wavelet Transformer for Arbitrary-Scale Super-Resolution [15.610136214020947]
Implicit neural representations have recently demonstrated promising potential in arbitrary-scale Super-Resolution (SR) of images. Most existing methods predict the pixel in the SR image based on the queried coordinate and ensemble nearby features. We propose the Local Implicit Wavelet Transformer (LIWT) to enhance the restoration of high-frequency texture details.
arXiv Detail & Related papers (2024-11-10T12:21:14Z)
Frequency-aware Feature Fusion for Dense Image Prediction [99.85757278772262]
We propose Frequency-Aware Feature Fusion (FreqFusion) for dense image prediction tasks. FreqFusion integrates an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator. Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries.
arXiv Detail & Related papers (2024-08-23T07:30:34Z)
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models [56.112302700630806]
We introduce an innovative algorithm named HiFi Tuner to enhance the appearance preservation of objects during personalized image generation. Key enhancements include the utilization of mask guidance, a novel parameter regularization technique, and the incorporation of step-wise subject representations. We extend our method to a novel image editing task: substituting the subject in an image through textual manipulations.
arXiv Detail & Related papers (2023-11-30T02:33:29Z)
Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising [16.43285056788183]
We propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG) Our method leverages a reconstructive denoising network to recover the majority of the underlying clean signal. It employs a diffusion algorithm to generate residual high-frequency details, thereby enhancing visual quality.
arXiv Detail & Related papers (2023-09-19T16:01:20Z)
Multi-stage image denoising with the wavelet transform [125.2251438120701]
Deep convolutional neural networks (CNNs) are used for image denoising via automatically mining accurate structure information. We propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i.e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and residual block (RB)
arXiv Detail & Related papers (2022-09-26T03:28:23Z)
FreqNet: A Frequency-domain Image Super-Resolution Network with Dicrete Cosine Transform [16.439669339293747]
Single image super-resolution(SISR) is an ill-posed problem that aims to obtain high-resolution (HR) output from low-resolution (LR) input. Despite the high peak signal-to-noise ratios(PSNR) results, it is difficult to determine whether the model correctly adds desired high-frequency details. We propose FreqNet, an intuitive pipeline from the frequency domain perspective, to solve this problem.
arXiv Detail & Related papers (2021-11-21T11:49:12Z)
High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved. With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images. We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z)
Wavelet-Based Network For High Dynamic Range Imaging [64.66969585951207]
Existing methods, such as optical flow based and end-to-end deep learning based solutions, are error-prone either in detail restoration or ghosting artifacts removal. In this work, we propose a novel frequency-guided end-to-end deep neural network (FNet) to conduct HDR fusion in the frequency domain, and Wavelet Transform (DWT) is used to decompose inputs into different frequency bands. The low-frequency signals are used to avoid specific ghosting artifacts, while the high-frequency signals are used for preserving details.
arXiv Detail & Related papers (2021-08-03T12:26:33Z)
Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions. We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality. The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z)
Focal Frequency Loss for Image Reconstruction and Synthesis [125.7135706352493]
We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize.
arXiv Detail & Related papers (2020-12-23T17:32:04Z)
Progressive Training of Multi-level Wavelet Residual Networks for Image Denoising [80.10533234415237]
This paper presents a multi-level wavelet residual network (MWRN) architecture as well as a progressive training scheme to improve image denoising performance. Experiments on both synthetic and real-world noisy images show that our PT-MWRN performs favorably against the state-of-the-art denoising methods.
arXiv Detail & Related papers (2020-10-23T14:14:00Z)
Wavelet Integrated CNNs for Noise-Robust Image Classification [51.18193090255933]
We enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT) WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
arXiv Detail & Related papers (2020-05-07T09:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.