Multimeasurement Generative Models
- URL: http://arxiv.org/abs/2112.09822v1
- Date: Sat, 18 Dec 2021 02:11:36 GMT
- Title: Multimeasurement Generative Models
- Authors: Saeed Saremi, Rupesh Kumar Srivastava
- Abstract summary: We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
- Score: 7.502947376736449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We formally map the problem of sampling from an unknown distribution with
density $p_X$ in $\mathbb{R}^d$ to the problem of learning and sampling
$p_\mathbf{Y}$ in $\mathbb{R}^{Md}$ obtained by convolving $p_X$ with a fixed
factorial kernel: $p_\mathbf{Y}$ is referred to as M-density and the factorial
kernel as multimeasurement noise model (MNM). The M-density is smoother than
$p_X$, easier to learn and sample from, yet for large $M$ the two problems are
mathematically equivalent since $X$ can be estimated exactly given
$\mathbf{Y}=\mathbf{y}$ using the Bayes estimator
$\widehat{x}(\mathbf{y})=\mathbb{E}[X\vert\mathbf{Y}=\mathbf{y}]$. To formulate
the problem, we derive $\widehat{x}(\mathbf{y})$ for Poisson and Gaussian MNMs
expressed in closed form in terms of unnormalized $p_\mathbf{Y}$. This leads to
a simple least-squares objective for learning parametric energy and score
functions. We present various parametrization schemes of interest, including
one in which studying Gaussian M-densities directly leads to multidenoising
autoencoders--this is the first theoretical connection made between denoising
autoencoders and empirical Bayes in the literature. Samples from $p_X$ are
obtained by walk-jump sampling (Saremi & Hyvarinen, 2019) via underdamped
Langevin MCMC (walk) to sample from $p_\mathbf{Y}$ and the multimeasurement
Bayes estimation of $X$ (jump). We study permutation invariant Gaussian
M-densities on MNIST, CIFAR-10, and FFHQ-256 datasets, and demonstrate the
effectiveness of this framework for realizing fast-mixing stable Markov chains
in high dimensions.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - A Fourier Approach to Mixture Learning [46.995354373649675]
We give an algorithm that efficiently learns the means in $d = O(log k/loglog k)$ dimensions under separation $d/sqrtlog k (modulo factors)
Our results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.
arXiv Detail & Related papers (2022-10-05T17:35:46Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures [9.053430799456587]
We study the problem of learning nonparametric distributions in a finite mixture.
We establish tight bounds on the sample complexity for learning the component distributions in such models.
arXiv Detail & Related papers (2022-03-28T23:53:48Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical
Approximation and Statistical Applications [18.618590805279187]
We show that $mathsfW_p(sigma)$ is controlled by a $pth order smooth dual Sobolev $mathsfd_p(sigma)$.
We derive the limit distribution of $sqrtnmathsfd_p(sigma)(hatmu_n,mu)$ in all dimensions $d$, when $mu$ is sub-Gaussian.
arXiv Detail & Related papers (2021-01-11T17:23:24Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.