Whitening Convergence Rate of Coupling-based Normalizing Flows
- URL: http://arxiv.org/abs/2210.14032v1
- Date: Tue, 25 Oct 2022 14:10:34 GMT
- Title: Whitening Convergence Rate of Coupling-based Normalizing Flows
- Authors: Felix Draxler, Christoph Schn\"orr, Ullrich K\"othe
- Abstract summary: Existing work shows that such flows weakly converge to arbitrary data distributions.
We prove that all coupling-based normalizing flows perform whitening of the data distribution.
We derive corresponding convergence bounds that show a linear convergence rate in the depth of the flow.
- Score: 1.1279808969568252
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Coupling-based normalizing flows (e.g. RealNVP) are a popular family of
normalizing flow architectures that work surprisingly well in practice. This
calls for theoretical understanding. Existing work shows that such flows weakly
converge to arbitrary data distributions. However, they make no statement about
the stricter convergence criterion used in practice, the maximum likelihood
loss. For the first time, we make a quantitative statement about this kind of
convergence: We prove that all coupling-based normalizing flows perform
whitening of the data distribution (i.e. diagonalize the covariance matrix) and
derive corresponding convergence bounds that show a linear convergence rate in
the depth of the flow. Numerical experiments demonstrate the implications of
our theory and point at open questions.
Related papers
- On the Universality of Coupling-based Normalizing Flows [10.479969050570684]
We propose a distributional theorem for well-conditioned coupling-based normalizing flows such as RealNVP.
We show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity.
arXiv Detail & Related papers (2024-02-09T17:51:43Z) - Good regularity creates large learning rate implicit biases: edge of
stability, balancing, and catapult [49.8719617899285]
Large learning rates, when applied to objective descent for non optimization, yield various implicit biases including the edge of stability.
This paper provides an initial step in descent and shows that these implicit biases are in fact various tips same iceberg.
arXiv Detail & Related papers (2023-10-26T01:11:17Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Penalising the biases in norm regularisation enforces sparsity [28.86954341732928]
This work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $sqrt1+x2$ factor.
Notably, this weighting factor disappears when the norm of bias terms is not regularised.
arXiv Detail & Related papers (2023-03-02T15:33:18Z) - A Convergence Theory for Federated Average: Beyond Smoothness [28.074273047592065]
Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly.
As a leading algorithm in this setting, Federated Average FedAvg, which runs Gradient Descent (SGD) in parallel on local devices, has been widely used.
This paper provides a theoretical convergence study on Federated Learning.
arXiv Detail & Related papers (2022-11-03T04:50:49Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Universal Approximation for Log-concave Distributions using
Well-conditioned Normalizing Flows [20.022920482589324]
We show that any log-concave distribution can be approximated using well-conditioned affine-coupling flows.
Our results also inform the practice of training affine couplings.
arXiv Detail & Related papers (2021-07-07T00:13:50Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Principled Interpolation in Normalizing Flows [5.582101184758527]
Generative models based on normalizing flows are very successful in modeling complex data distributions.
straightforward linears show unexpected side effects, as paths lie outside the area where samples are observed.
This observation suggests that correcting the norm should generally result in betters, but it is not clear how to correct the norm in an unambiguous way.
arXiv Detail & Related papers (2020-10-22T21:02:10Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.