Related papers: Self Normalizing Flows

Self Normalizing Flows

URL: http://arxiv.org/abs/2011.07248v2
Date: Wed, 9 Jun 2021 12:14:06 GMT
Title: Self Normalizing Flows
Authors: T. Anderson Keller, Jorn W.T. Peters, Priyank Jaini, Emiel Hoogeboom, Patrick Forr\'e, Max Welling
Abstract summary: We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
Score: 65.73510214694987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.

Related papers

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training [67.45211108321203]
We introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer.<n>We show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs.
arXiv Detail & Related papers (2025-06-05T16:50:23Z)
Inference Acceleration of Autoregressive Normalizing Flows by Selective Jacobi Decoding [12.338918067455436]
Normalizing flows are promising generative models with advantages such as theoretical rigor, analytical log-likelihood, and end-to-end training.<n>Recent advancements utilize autoregressive modeling, significantly enhancing expressive power and generation quality.<n>We propose a selective Jacobi decoding (SeJD) strategy that accelerates autoregressive inference through parallel iterative optimization.
arXiv Detail & Related papers (2025-05-30T16:53:15Z)
Efficient Differentiable Approximation of Generalized Low-rank Regularization [64.73416824444328]
Low-rank regularization (LRR) has been widely applied in various machine learning tasks.<n>In this paper, we propose an efficient differentiable approximation of LRR.
arXiv Detail & Related papers (2025-05-21T11:49:17Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Fast and Unified Path Gradient Estimators for Normalizing Flows [5.64979077798699]
path gradient estimators for normalizing flows have lower variance compared to standard estimators for variational inference. We propose a fast path gradient estimator which improves computational efficiency significantly. We empirically establish its superior performance and reduced variance for several natural sciences applications.
arXiv Detail & Related papers (2024-03-23T16:21:22Z)
Free-form Flows: Make Any Architecture a Normalizing Flow [8.163244519983298]
We develop a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. We achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks.
arXiv Detail & Related papers (2023-10-25T13:23:08Z)
Training Energy-Based Normalizing Flow with Score-Matching Objectives [36.0810550035231]
We present a new flow-based modeling approach called energy-based normalizing flow (EBFlow) We demonstrate that by optimizing EBFlow with score-matching objectives, the computation of Jacobian determinants for linear transformations can be entirely bypassed.
arXiv Detail & Related papers (2023-05-24T15:54:29Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms. These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Relative gradient optimization of the Jacobian term in unsupervised deep learning [9.385902422987677]
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian. We propose a new approach for exact training of such neural networks.
arXiv Detail & Related papers (2020-06-26T16:41:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.