Sampling Multimodal Distributions with the Vanilla Score: Benefits of
  Data-Based Initialization
        - URL: http://arxiv.org/abs/2310.01762v1
- Date: Tue, 3 Oct 2023 03:06:59 GMT
- Title: Sampling Multimodal Distributions with the Vanilla Score: Benefits of
  Data-Based Initialization
- Authors: Frederic Koehler, Thuy-Duong Vuong
- Abstract summary: Hyv"arinen proposed vanilla score matching as a way to learn distributions from data.
We prove that the Langevin diffusion with early stopping, at the empirical distribution, and run on a score function estimated from data successfully generates natural multimodal distributions.
- Score: 19.19974210314107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   There is a long history, as well as a recent explosion of interest, in
statistical and generative modeling approaches based on score functions --
derivatives of the log-likelihood of a distribution. In seminal works,
Hyv\"arinen proposed vanilla score matching as a way to learn distributions
from data by computing an estimate of the score function of the underlying
ground truth, and established connections between this method and established
techniques like Contrastive Divergence and Pseudolikelihood estimation. It is
by now well-known that vanilla score matching has significant difficulties
learning multimodal distributions. Although there are various ways to overcome
this difficulty, the following question has remained unanswered -- is there a
natural way to sample multimodal distributions using just the vanilla score?
Inspired by a long line of related experimental works, we prove that the
Langevin diffusion with early stopping, initialized at the empirical
distribution, and run on a score function estimated from data successfully
generates natural multimodal distributions (mixtures of log-concave
distributions).
 
      
        Related papers
        - A Malliavin calculus approach to score functions in diffusion generative   models [5.124031464211652]
 We derive an exact, closed form, expression for the score function for a broad class of nonlinear diffusion generative models.<n>Our results can be extended to broader classes of differential equations, opening new directions for the development of score-based diffusion generative models.
 arXiv  Detail & Related papers  (2025-07-08T00:20:57Z)
- Finite Sample Analysis of Distributional TD Learning with Linear   Function Approximation [21.999445060856278]
 We show that the sample complexity of linear distributional TD learning matches that of the classic linear TD learning.
Our findings provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.
 arXiv  Detail & Related papers  (2025-02-20T00:53:22Z)
- Diffusion Attribution Score: Evaluating Training Data Influence in   Diffusion Models [22.39558434131574]
 Existing data attribution methods for diffusion models typically quantify the contribution of a training sample.
We argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss.
We propose Diffusion Attribution Score (textitDAS) to measure the direct comparison between predicted distributions with an attribution score.
 arXiv  Detail & Related papers  (2024-10-24T10:58:17Z)
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional   Samplers [49.97755400231656]
 We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
 arXiv  Detail & Related papers  (2024-10-17T16:42:12Z)
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
 Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
 arXiv  Detail & Related papers  (2024-03-11T13:44:49Z)
- Score-based Source Separation with Applications to Digital Communication
  Signals [72.6570125649502]
 We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
 arXiv  Detail & Related papers  (2023-06-26T04:12:40Z)
- Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
 We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
 arXiv  Detail & Related papers  (2022-11-30T05:33:29Z)
- Collaborative Learning of Distributions under Heterogeneity and
  Communication Constraints [35.82172666266493]
 In machine learning, users often have to collaborate to learn distributions that generate the data.
We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution.
Then, the learned central distribution is fine-tuned to estimate the individual distributions of users.
 arXiv  Detail & Related papers  (2022-06-01T18:43:06Z)
- Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
 We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
 arXiv  Detail & Related papers  (2021-10-20T12:25:22Z)
- Probabilistic Kolmogorov-Arnold Network [1.4732811715354455]
 The present paper proposes a method for estimating probability distributions of the outputs in the case of aleatoric uncertainty.
The suggested approach covers input-dependent probability distributions of the outputs, as well as the variation of the distribution type with the inputs.
Although the method is applicable to any regression model, the present paper combines it with KANs, since the specific structure of KANs leads to computationally-efficient models' construction.
 arXiv  Detail & Related papers  (2021-04-04T23:49:15Z)
- DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
 Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data.
Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information.
Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
 arXiv  Detail & Related papers  (2020-10-05T04:19:27Z)
- Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
 We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
 arXiv  Detail & Related papers  (2020-07-24T05:18:17Z)
- The continuous categorical: a novel simplex-valued exponential family [23.983555024375306]
 We show that standard choices for simplex-valued data suffer from a number of limitations, including bias and numerical issues.
We resolve these limitations by introducing a novel exponential family of distributions for modeling simplex-valued data.
Unlike the Dirichlet and other typical choices, the continuous categorical results in a well-behaved probabilistic loss function.
 arXiv  Detail & Related papers  (2020-02-20T04:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.