Related papers: Representational aspects of depth and conditioning in normalizing flows

Representational aspects of depth and conditioning in normalizing flows

URL: http://arxiv.org/abs/2010.01155v2
Date: Fri, 25 Jun 2021 23:48:35 GMT
Title: Representational aspects of depth and conditioning in normalizing flows
Authors: Frederic Koehler, Viraj Mehta, Andrej Risteski
Abstract summary: We show that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance.
Score: 33.4333537858003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $\Theta(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

Related papers

Scale Propagation Network for Generalizable Depth Completion [16.733495588009184]
We propose a novel scale propagation normalization (SP-Norm) method to propagate scales from input to output. We also develop a new network architecture based on SP-Norm and the ConvNeXt V2 backbone. Our model consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-10-24T03:53:06Z)
NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function [55.86697795177619]
Normal estimation for 3D point clouds is a fundamental task in 3D geometry processing. We introduce a new paradigm for learning neural gradient functions, which encourages the neural network to fit the input point clouds. Our excellent results on widely used benchmarks demonstrate that our method can learn more accurate normals for both unoriented and oriented normal estimation tasks.
arXiv Detail & Related papers (2023-11-01T09:25:29Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z)
Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows [20.022920482589324]
We show that any log-concave distribution can be approximated using well-conditioned affine-coupling flows. Our results also inform the practice of training affine couplings.
arXiv Detail & Related papers (2021-07-07T00:13:50Z)
DiGS : Divergence guided shape implicit neural representation for unoriented point clouds [36.60407995156801]
Shape implicit neural representations (INRs) have recently shown to be effective in shape analysis and reconstruction tasks. We propose a divergence guided shape representation learning approach that does not require normal vectors as input.
arXiv Detail & Related papers (2021-06-21T02:10:03Z)
Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it. Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z)
Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
End-to-end Interpretable Learning of Non-blind Image Deblurring [102.75982704671029]
Non-blind image deblurring is typically formulated as a linear least-squares problem regularized by natural priors on the corresponding sharp picture's gradients. We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels.
arXiv Detail & Related papers (2020-07-03T15:45:01Z)
Neural Ordinary Differential Equations on Manifolds [0.342658286826597]
Recently normalizing flows in Euclidean space based on Neural ODEs show great promise, yet suffer the same limitations. We show how vector fields provide a general framework for parameterizing a flexible class of invertible mapping on these spaces.
arXiv Detail & Related papers (2020-06-11T17:56:34Z)
You say Normalizing Flows I see Bayesian Networks [11.23030807455021]
We show that normalizing flows reduce to Bayesian networks with a pre-defined topology and a learnable density at each node. We show that stacking multiple transformations in a normalizing flow relaxes independence assumptions and entangles the model distribution. We prove the non-universality of the affine normalizing flow, regardless of its depth.
arXiv Detail & Related papers (2020-06-01T11:54:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.