On the Convergence Rate of Gaussianization with Random Rotations
- URL: http://arxiv.org/abs/2306.13520v1
- Date: Fri, 23 Jun 2023 14:35:39 GMT
- Title: On the Convergence Rate of Gaussianization with Random Rotations
- Authors: Felix Draxler, Lars K\"uhmichel, Armand Rousselot, Jens M\"uller,
Christoph Schn\"orr, Ullrich K\"othe
- Abstract summary: We show analytically that the number of required layers scales linearly with the dimension for Gaussian input.
We find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions.
- Score: 0.8081564951955755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gaussianization is a simple generative model that can be trained without
backpropagation. It has shown compelling performance on low dimensional data.
As the dimension increases, however, it has been observed that the convergence
speed slows down. We show analytically that the number of required layers
scales linearly with the dimension for Gaussian input. We argue that this is
because the model is unable to capture dependencies between dimensions.
Empirically, we find the same linear increase in cost for arbitrary input
$p(x)$, but observe favorable scaling for some distributions. We explore
potential speed-ups and formulate challenges for further research.
Related papers
- WeSpeR: Computing non-linear shrinkage formulas for the weighted sample covariance [0.0]
We use theoretical properties of the sample spectrum in order to derive the textitWeSpeR algorithm.<n>We significantly speed up non-linear shrinkage in dimension higher than $1000$.
arXiv Detail & Related papers (2024-10-18T12:26:51Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Scaling Riemannian Diffusion Models [68.52820280448991]
We show that our method enables us to scale to high dimensional tasks on nontrivial manifold.
We model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.
arXiv Detail & Related papers (2023-10-30T21:27:53Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Precise Learning Curves and Higher-Order Scaling Limits for Dot Product
Kernel Regression [41.48538038768993]
We focus on the problem of kernel ridge regression for dot-product kernels.
We observe a peak in the learning curve whenever $m approx dr/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.
arXiv Detail & Related papers (2022-05-30T04:21:31Z) - Sparse Gaussian Processes with Spherical Harmonic Features [14.72311048788194]
We introduce a new class of inter-domain variational Gaussian processes (GP)
Our inference scheme is comparable to variational Fourier features, but it does not suffer from the curse of dimensionality.
Our experiments show that our model is able to fit a regression model for a dataset with 6 million entries two orders of magnitude faster.
arXiv Detail & Related papers (2020-06-30T10:19:32Z) - Linear-time inference for Gaussian Processes on one dimension [17.77516394591124]
We investigate data sampled on one dimension for which state-space models are popular due to their linearly-scaling computational costs.
We provide the first general proof of conjecture that state-space models are general, able to approximate any one-dimensional Gaussian Processes.
We develop parallelized algorithms for performing inference and learning in the LEG model, test the algorithm on real and synthetic data, and demonstrate scaling to datasets with billions of samples.
arXiv Detail & Related papers (2020-03-11T23:20:13Z) - Randomly Projected Additive Gaussian Processes for Regression [37.367935314532154]
We use additive sums of kernels for GP regression, where each kernel operates on a different random projection of its inputs.
We prove this convergence and its rate, and propose a deterministic approach that converges more quickly than purely random projections.
arXiv Detail & Related papers (2019-12-30T07:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.