Related papers: Whitening Convergence Rate of Coupling-based Normalizing Flows

Whitening Convergence Rate of Coupling-based Normalizing Flows

URL: http://arxiv.org/abs/2210.14032v1
Date: Tue, 25 Oct 2022 14:10:34 GMT
Title: Whitening Convergence Rate of Coupling-based Normalizing Flows
Authors: Felix Draxler, Christoph Schn\"orr, Ullrich K\"othe
Abstract summary: Existing work shows that such flows weakly converge to arbitrary data distributions. We prove that all coupling-based normalizing flows perform whitening of the data distribution. We derive corresponding convergence bounds that show a linear convergence rate in the depth of the flow.
Score: 1.1279808969568252
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Coupling-based normalizing flows (e.g. RealNVP) are a popular family of normalizing flow architectures that work surprisingly well in practice. This calls for theoretical understanding. Existing work shows that such flows weakly converge to arbitrary data distributions. However, they make no statement about the stricter convergence criterion used in practice, the maximum likelihood loss. For the first time, we make a quantitative statement about this kind of convergence: We prove that all coupling-based normalizing flows perform whitening of the data distribution (i.e. diagonalize the covariance matrix) and derive corresponding convergence bounds that show a linear convergence rate in the depth of the flow. Numerical experiments demonstrate the implications of our theory and point at open questions.

Related papers

Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows [40.884514919698596]
Flow matching has emerged as a powerful framework for generative modeling. We prove that higher-order flow matching preserves worst case optimality as a distribution estimator.
arXiv Detail & Related papers (2025-03-12T05:07:07Z)
A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA) In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z)
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence [54.580605276017096]
Diffusion models have emerged as a powerful tool for image generation and denoising. Recently, Liu et al. designed a novel alternative generative model Rectified Flow (RF) RF aims to learn straight flow trajectories from noise to data using a sequence of convex optimization problems.
arXiv Detail & Related papers (2024-10-19T02:36:11Z)
On the Universality of Coupling-based Normalizing Flows [10.479969050570684]
We propose a distributional theorem for well-conditioned coupling-based normalizing flows such as RealNVP. We show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity.
arXiv Detail & Related papers (2024-02-09T17:51:43Z)
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate. We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z)
Penalising the biases in norm regularisation enforces sparsity [28.86954341732928]
This work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $sqrt1+x2$ factor. Notably, this weighting factor disappears when the norm of bias terms is not regularised.
arXiv Detail & Related papers (2023-03-02T15:33:18Z)
A Convergence Theory for Federated Average: Beyond Smoothness [28.074273047592065]
Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly. As a leading algorithm in this setting, Federated Average FedAvg, which runs Gradient Descent (SGD) in parallel on local devices, has been widely used. This paper provides a theoretical convergence study on Federated Learning.
arXiv Detail & Related papers (2022-11-03T04:50:49Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region. Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z)
Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows [20.022920482589324]
We show that any log-concave distribution can be approximated using well-conditioned affine-coupling flows. Our results also inform the practice of training affine couplings.
arXiv Detail & Related papers (2021-07-07T00:13:50Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Principled Interpolation in Normalizing Flows [5.582101184758527]
Generative models based on normalizing flows are very successful in modeling complex data distributions. straightforward linears show unexpected side effects, as paths lie outside the area where samples are observed. This observation suggests that correcting the norm should generally result in betters, but it is not clear how to correct the norm in an unambiguous way.
arXiv Detail & Related papers (2020-10-22T21:02:10Z)
The Convergence Indicator: Improved and completely characterized parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge. Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.