Beyond Linear Diffusions: Improved Representations for Rare Conditional Generative Modeling
- URL: http://arxiv.org/abs/2510.02499v1
- Date: Thu, 02 Oct 2025 19:06:14 GMT
- Title: Beyond Linear Diffusions: Improved Representations for Rare Conditional Generative Modeling
- Authors: Kulunu Dharmakeerthi, Yousef El-Laham, Henry H. Wong, Vamsi K. Potluru, Changhong He, Taosong He,
- Abstract summary: We show it is possible to adapt the data representation and forward scheme so that the sample complexity of learning a score-based generative model is small in low probability regions of the conditioning space.<n>We show how diffusion with a data-driven choice of nonlinear drift term is best suited to model tail events under an appropriate representation of the data.
- Score: 4.527435625329663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have emerged as powerful generative frameworks with widespread applications across machine learning and artificial intelligence systems. While current research has predominantly focused on linear diffusions, these approaches can face significant challenges when modeling a conditional distribution, $P(Y|X=x)$, when $P(X=x)$ is small. In these regions, few samples, if any, are available for training, thus modeling the corresponding conditional density may be difficult. Recognizing this, we show it is possible to adapt the data representation and forward scheme so that the sample complexity of learning a score-based generative model is small in low probability regions of the conditioning space. Drawing inspiration from conditional extreme value theory we characterize this method precisely in the special case in the tail regions of the conditioning variable, $X$. We show how diffusion with a data-driven choice of nonlinear drift term is best suited to model tail events under an appropriate representation of the data. Through empirical validation on two synthetic datasets and a real-world financial dataset, we demonstrate that our tail-adaptive approach significantly outperforms standard diffusion models in accurately capturing response distributions at the extreme tail conditions.
Related papers
- DS-Diffusion: Data Style-Guided Diffusion Model for Time-Series Generation [3.7098771725459336]
We propose a data style-guided diffusion model (DS-Diffusion) for time series generation tasks.<n>The DS-Diffusion avoids retraining the entire framework to introduce conditional guidance.<n>The generated samples can clearly indicate the data style from which they originate.
arXiv Detail & Related papers (2025-09-23T03:06:39Z) - Efficient Flow Matching using Latent Variables [9.363347684114474]
We show that $texttLatent-CFM$ exhibits improved generation quality with significantly less training and computation than state-of-the-art flow matching models.<n>We also consider generative modeling of spatial fields stemming from physical processes.
arXiv Detail & Related papers (2025-05-07T14:59:23Z) - Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces [0.0]
We show that Bayesian flow network is capable of effortlessly generating high quality out-of-distribution samples.<n>We introduce a semi-autoregressive training/sampling method that helps to enhance the model performance and surpass the state-of-the-art models.
arXiv Detail & Related papers (2024-12-16T04:43:54Z) - On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory [87.00653989457834]
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning.
Despite the empirical success, theory of conditional diffusion models is largely missing.
This paper bridges the gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models.
arXiv Detail & Related papers (2024-03-18T17:08:24Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - User-defined Event Sampling and Uncertainty Quantification in Diffusion
Models for Physical Dynamical Systems [49.75149094527068]
We show that diffusion models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems.
We develop a probabilistic approximation scheme for the conditional score function which converges to the true distribution as the noise level decreases.
We are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
arXiv Detail & Related papers (2023-06-13T03:42:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.