Related papers: Learning and Inference in Imaginary Noise Models

Learning and Inference in Imaginary Noise Models

URL: http://arxiv.org/abs/2005.09047v3
Date: Fri, 5 Jun 2020 20:05:03 GMT
Title: Learning and Inference in Imaginary Noise Models
Authors: Saeed Saremi
Abstract summary: A notion of smoothed variational inference emerges where the smoothing is implicitly enforced by the noise model of the decoder. This is the concept of imaginary noise model, where the noise model dictates the functional form of the variational lower bound $mathcalL(sigma)$, but the noisy data are never seen during learning. We report an intriguing power law $mathcalD_rm KL sim sigma-nu$ for the learned models and we study the inference in the $sigma$-VAE for unseen noisy
Score: 1.599072005190786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inspired by recent developments in learning smoothed densities with empirical Bayes, we study variational autoencoders with a decoder that is tailored for the random variable $Y=X+N(0,\sigma^2 I_d)$. A notion of smoothed variational inference emerges where the smoothing is implicitly enforced by the noise model of the decoder; "implicit", since during training the encoder only sees clean samples. This is the concept of imaginary noise model, where the noise model dictates the functional form of the variational lower bound $\mathcal{L}(\sigma)$, but the noisy data are never seen during learning. The model is named $\sigma$-VAE. We prove that all $\sigma$-VAEs are equivalent to each other via a simple $\beta$-VAE expansion: $\mathcal{L}(\sigma_2) \equiv \mathcal{L}(\sigma_1,\beta)$, where $\beta=\sigma_2^2/\sigma_1^2$. We prove a similar result for the Laplace distribution in exponential families. Empirically, we report an intriguing power law $\mathcal{D}_{\rm KL} \sim \sigma^{-\nu}$ for the learned models and we study the inference in the $\sigma$-VAE for unseen noisy data. The experiments were performed on MNIST, where we show that quite remarkably the model can make reasonable inferences on extremely noisy samples even though it has not seen any during training. The vanilla VAE completely breaks down in this regime. We finish with a hypothesis (the XYZ hypothesis) on the findings here.

Related papers

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [65.37519531362157]
We show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $tildeOmega(d1/2/alpha2)$.
arXiv Detail & Related papers (2025-10-12T15:42:44Z)
Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes [0.5892638927736114]
We prove an algorithm exists that, with high probability, outputs a simplex within $ell$ or total variation (TV) distance at most $varepsilon$ from the true simplex.<n>In the noiseless scenario, our lower bound $n ge Omega(K/varepsilon)$ matches known upper bounds up to constant factors.
arXiv Detail & Related papers (2025-06-11T18:35:38Z)
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models [65.71506381302815]
We propose amortize the cost of sampling from a posterior distribution of the form $p(mathbfxmidmathbfy) propto p_theta(mathbfx)$. For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference.
arXiv Detail & Related papers (2025-02-10T19:49:54Z)
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis [55.561961365113554]
3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness for novel view synthesis (NVS) However, the 3DGS model tends to overfit when trained with sparse posed views, limiting its generalization ability to novel views. We present a Self-Ensembling Gaussian Splatting (SE-GS) approach to alleviate the overfitting problem. Our approach improves NVS quality with few-shot training views, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-10-31T18:43:48Z)
Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class. Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures. This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z)
Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model. We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z)
Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods. Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z)
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably [42.427869499882206]
We parameterize the rank one matrix $Y*$ by $XXtop$, where $Xin Rdtimes d$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(sigma2/d)$. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(sigma2)$.
arXiv Detail & Related papers (2022-02-07T21:53:51Z)
Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes. It allows computing statistical quantities that are in general difficult to compute. We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.