Machine Learning Trivializing Maps: A First Step Towards Understanding
How Flow-Based Samplers Scale Up
- URL: http://arxiv.org/abs/2112.15532v1
- Date: Fri, 31 Dec 2021 16:17:19 GMT
- Title: Machine Learning Trivializing Maps: A First Step Towards Understanding
How Flow-Based Samplers Scale Up
- Authors: Luigi Del Debbio and Joe Marsh Rossney and Michael Wilson
- Abstract summary: We show that approximations of trivializing maps can be machine-learned' by a class of invertible, differentiable models.
We conduct an exploratory scaling study using two-dimensional $phi4$ with up to $202$ lattice sites.
- Score: 0.6445605125467573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A trivializing map is a field transformation whose Jacobian determinant
exactly cancels the interaction terms in the action, providing a representation
of the theory in terms of a deterministic transformation of a distribution from
which sampling is trivial. Recently, a proof-of-principle study by Albergo,
Kanwar and Shanahan [arXiv:1904.12072] demonstrated that approximations of
trivializing maps can be `machine-learned' by a class of invertible,
differentiable neural models called \textit{normalizing flows}. By ensuring
that the Jacobian determinant can be computed efficiently, asymptotically exact
sampling from the theory of interest can be performed by drawing samples from a
simple distribution and passing them through the network. From a theoretical
perspective, this approach has the potential to become more efficient than
traditional Markov Chain Monte Carlo sampling techniques, where
autocorrelations severely diminish the sampling efficiency as one approaches
the continuum limit. A major caveat is that it is not yet understood how the
size of models and the cost of training them is expected to scale. As a first
step, we have conducted an exploratory scaling study using two-dimensional
$\phi^4$ with up to $20^2$ lattice sites. Although the scope of our study is
limited to a particular model architecture and training algorithm, initial
results paint an interesting picture in which training costs grow very quickly
indeed. We describe a candidate explanation for the poor scaling, and outline
our intentions to clarify the situation in future work.
Related papers
- Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.
We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.
We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Normalizing flow sampling with Langevin dynamics in the latent space [12.91637880428221]
Normalizing flows (NF) use a continuous generator to map a simple latent (e.g. Gaussian) distribution, towards an empirical target distribution associated with a training data set.
Since standard NF implement differentiable maps, they may suffer from pathological behaviors when targeting complex distributions.
This paper proposes a new Markov chain Monte Carlo algorithm to sample from the target distribution in the latent domain before transporting it back to the target domain.
arXiv Detail & Related papers (2023-05-20T09:31:35Z) - An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks.
Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - The Boomerang Sampler [4.588028371034406]
This paper introduces the Boomerang Sampler as a novel class of continuous-time non-reversible Markov chain Monte Carlo algorithms.
We demonstrate that the method is easy to implement and demonstrate empirically that it can out-perform existing benchmark piecewise deterministic Markov processes.
arXiv Detail & Related papers (2020-06-24T14:52:22Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.