Wavelet-based Variational Autoencoders for High-Resolution Image Generation
- URL: http://arxiv.org/abs/2504.13214v1
- Date: Wed, 16 Apr 2025 13:51:41 GMT
- Title: Wavelet-based Variational Autoencoders for High-Resolution Image Generation
- Authors: Andrew Kiruluta,
- Abstract summary: Variational Autoencoders (VAEs) are powerful generative models capable of learning compact latent representations.<n>In this paper, we explore a novel wavelet-based approach (Wavelet-VAE) in which the latent space is constructed using multi-scale Haar wavelet coefficients.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational Autoencoders (VAEs) are powerful generative models capable of learning compact latent representations. However, conventional VAEs often generate relatively blurry images due to their assumption of an isotropic Gaussian latent space and constraints in capturing high-frequency details. In this paper, we explore a novel wavelet-based approach (Wavelet-VAE) in which the latent space is constructed using multi-scale Haar wavelet coefficients. We propose a comprehensive method to encode the image features into multi-scale detail and approximation coefficients and introduce a learnable noise parameter to maintain stochasticity. We thoroughly discuss how to reformulate the reparameterization trick, address the KL divergence term, and integrate wavelet sparsity principles into the training objective. Our experimental evaluation on CIFAR-10 and other high-resolution datasets demonstrates that the Wavelet-VAE improves visual fidelity and recovers higher-resolution details compared to conventional VAEs. We conclude with a discussion of advantages, potential limitations, and future research directions for wavelet-based generative modeling.
Related papers
- Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution [4.307648859471193]
We introduce ResQu, a novel SR framework that integrates a quaternion wavelet preprocessing framework with latent diffusion models.
Our approach enhances the conditioning process by exploiting quaternion wavelet embeddings, which are dynamically integrated at different stages of denoising.
Our method achieves outstanding SR results, outperforming in many cases existing approaches in perceptual quality and standard evaluation metrics.
arXiv Detail & Related papers (2025-05-01T06:17:33Z) - A Hybrid Wavelet-Fourier Method for Next-Generation Conditional Diffusion Models [0.0]
We present a novel generative modeling framework,Wavelet-Fourier-Diffusion, which adapts the diffusion paradigm to hybrid frequency representations.<n>We show how the hybrid frequency-based representation improves control over global coherence and fine texture synthesis.
arXiv Detail & Related papers (2025-04-04T17:11:04Z) - VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression [59.14355576912495]
NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences.<n>The substantial data volumes pose significant challenges for storage and transmission.<n>We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
arXiv Detail & Related papers (2024-12-16T01:28:04Z) - Local Implicit Wavelet Transformer for Arbitrary-Scale Super-Resolution [15.610136214020947]
Implicit neural representations have recently demonstrated promising potential in arbitrary-scale Super-Resolution (SR) of images.
Most existing methods predict the pixel in the SR image based on the queried coordinate and ensemble nearby features.
We propose the Local Implicit Wavelet Transformer (LIWT) to enhance the restoration of high-frequency texture details.
arXiv Detail & Related papers (2024-11-10T12:21:14Z) - WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Stage-by-stage Wavelet Optimization Refinement Diffusion Model for
Sparse-View CT Reconstruction [14.037398189132468]
We present an innovative approach named the Stage-by-stage Wavelet Optimization Refinement Diffusion (SWORD) model for sparse-view CT reconstruction.
Specifically, we establish a unified mathematical model integrating low-frequency and high-frequency generative models, achieving the solution with optimization procedure.
Our method rooted in established optimization theory, comprising three distinct stages, including low-frequency generation, high-frequency refinement and domain transform.
arXiv Detail & Related papers (2023-08-30T10:48:53Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Universal Face Restoration With Memorized Modulation [73.34750780570909]
This paper proposes a Restoration with Memorized Modulation (RMM) framework for universal Blind Face Restoration (BFR)
We apply random noise as well as unsupervised wavelet memory to adaptively modulate the face-enhancement generator.
Experimental results show the superiority of the proposed method compared with the state-of-the-art methods, and a good generalization in the wild.
arXiv Detail & Related papers (2021-10-03T15:55:07Z) - Wavelet Transform-assisted Adaptive Generative Modeling for Colorization [15.814591440291652]
This study presents a novel scheme that exploiting the score-based generative model in wavelet domain to address the issue.
By taking advantage of the multi-scale and multi-channel representation via wavelet transform, the proposed model learns the priors from stacked wavelet coefficient components.
Experiments demonstrated remarkable improvements of the proposed model on colorization quality, particularly on colorization robustness and diversity.
arXiv Detail & Related papers (2021-07-09T07:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.