Distribution Matching Variational AutoEncoder
- URL: http://arxiv.org/abs/2512.07778v1
- Date: Mon, 08 Dec 2025 17:59:47 GMT
- Title: Distribution Matching Variational AutoEncoder
- Authors: Sen Ye, Jianning Pei, Mengde Xu, Shuyang Gu, Chunyu Wang, Liwei Wang, Han Hu,
- Abstract summary: Existing approaches such as VAEs implicitly constrain the latent space without explicitly shaping its distribution.<n>We introduce textbfDistribution-Matching VAE (textbfDMVAE), which explicitly aligns the encoder's latent distribution with an arbitrary reference distribution.<n>Our results suggest that choosing a suitable latent distribution structure (achieved via distribution-level alignment) is key to bridging the gap between easy-to-model latents and high-fidelity image synthesis.
- Score: 24.58582338610613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most visual generative models compress images into a latent space before applying diffusion or autoregressive modelling. Yet, existing approaches such as VAEs and foundation model aligned encoders implicitly constrain the latent space without explicitly shaping its distribution, making it unclear which types of distributions are optimal for modeling. We introduce \textbf{Distribution-Matching VAE} (\textbf{DMVAE}), which explicitly aligns the encoder's latent distribution with an arbitrary reference distribution via a distribution matching constraint. This generalizes beyond the Gaussian prior of conventional VAEs, enabling alignment with distributions derived from self-supervised features, diffusion noise, or other prior distributions. With DMVAE, we can systematically investigate which latent distributions are more conducive to modeling, and we find that SSL-derived distributions provide an excellent balance between reconstruction fidelity and modeling efficiency, reaching gFID equals 3.2 on ImageNet with only 64 training epochs. Our results suggest that choosing a suitable latent distribution structure (achieved via distribution-level alignment), rather than relying on fixed priors, is key to bridging the gap between easy-to-model latents and high-fidelity image synthesis. Code is avaliable at https://github.com/sen-ye/dmvae.
Related papers
- Better Source, Better Flow: Learning Condition-Dependent Source Distribution for Flow Matching [34.811045663987805]
Flow matching has emerged as a promising alternative to diffusion-based generative models.<n>We show that principled design of the source distribution is not only feasible but also beneficial at the scale of modern text-to-image systems.
arXiv Detail & Related papers (2026-02-05T18:08:20Z) - DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation [50.32808229665005]
We present the first systematic study on universal feature coding for large models.<n>Key challenge lies in the inherently diverse and distributionally incompatible nature of features extracted from different models.<n>We propose a learned peaky-to-balanced distribution transformation, which reshapes highly skewed feature distributions into a common, balanced target space.
arXiv Detail & Related papers (2025-06-19T17:43:32Z) - Direct Distributional Optimization for Provable Alignment of Diffusion Models [39.048284342436666]
We introduce a novel alignment method for diffusion models from distribution optimization perspectives.<n>We first formulate the problem as a generic regularized loss minimization over probability distributions.<n>We enable sampling from the learned distribution by approximating its score function via Doob's $h$-transform technique.
arXiv Detail & Related papers (2025-02-05T07:35:15Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Symmetric Equilibrium Learning of VAEs [56.56929742714685]
We view variational autoencoders (VAEs) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa.
We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling.
arXiv Detail & Related papers (2023-07-19T10:27:34Z) - Dior-CVAE: Pre-trained Language Models and Diffusion Priors for
Variational Dialog Generation [70.2283756542824]
Dior-CVAE is a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges.
We employ a diffusion model to increase the complexity of the prior distribution and its compatibility with the distributions produced by a PLM.
Experiments across two commonly used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.
arXiv Detail & Related papers (2023-05-24T11:06:52Z) - The Score-Difference Flow for Implicit Generative Modeling [1.1929584800629673]
Implicit generative modeling aims to produce samples of synthetic data matching a target data distribution.<n>Recent work has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution.<n>We present the score difference between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them.
arXiv Detail & Related papers (2023-04-25T15:21:12Z) - Structured Uncertainty in the Observation Space of Variational
Autoencoders [20.709989481734794]
In image synthesis, sampling from such distributions produces spatially-incoherent results with uncorrelated pixel noise.
We propose an alternative model for the observation space, encoding spatial dependencies via a low-rank parameterisation.
In contrast to pixel-wise independent distributions, our samples seem to contain semantically meaningful variations from the mean allowing the prediction of multiple plausible outputs.
arXiv Detail & Related papers (2022-05-25T07:12:50Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Generative Model without Prior Distribution Matching [26.91643368299913]
Variational Autoencoder (VAE) and its variations are classic generative models by learning a low-dimensional latent representation to satisfy some prior distribution.
We propose to let the prior match the embedding distribution rather than imposing the latent variables to fit the prior.
arXiv Detail & Related papers (2020-09-23T09:33:24Z) - Variational Hyper-Encoding Networks [62.74164588885455]
We propose a framework called HyperVAE for encoding distributions of neural network parameters theta.
We predict the posterior distribution of the latent code, then use a matrix-network decoder to generate a posterior distribution q(theta)
arXiv Detail & Related papers (2020-05-18T06:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.