Hierarchical autoregressive neural networks for statistical systems
- URL: http://arxiv.org/abs/2203.10989v1
- Date: Mon, 21 Mar 2022 13:55:53 GMT
- Title: Hierarchical autoregressive neural networks for statistical systems
- Authors: Piotr Bia{\l}as, Piotr Korcyl, Tomasz Stebel
- Abstract summary: We propose a hierarchical association of physical degrees of freedom, for instance spins, to neurons which replaces it with the scaling with the linear extent $L$ of the system.
We demonstrate our approach on the two-dimensional Ising model by simulating lattices of various sizes up to $128 times 128$ spins, with time benchmarks reaching lattices of size $512 times 512$.
- Score: 0.05156484100374058
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: It was recently proposed that neural networks could be used to approximate
many-dimensional probability distributions that appear e.g. in lattice field
theories or statistical mechanics. Subsequently they can be used as variational
approximators to asses extensive properties of statistical systems, like free
energy, and also as neural samplers used in Monte Carlo simulations. The
practical application of this approach is unfortunately limited by its
unfavorable scaling both of the numerical cost required for training, and the
memory requirements with the system size. This is due to the fact that the
original proposition involved a neural network of width which scaled with the
total number of degrees of freedom, e.g. $L^2$ in case of a two dimensional
$L\times L$ lattice. In this work we propose a hierarchical association of
physical degrees of freedom, for instance spins, to neurons which replaces it
with the scaling with the linear extent $L$ of the system. We demonstrate our
approach on the two-dimensional Ising model by simulating lattices of various
sizes up to $128 \times 128$ spins, with time benchmarks reaching lattices of
size $512 \times 512$. We observe that our proposal improves the quality of
neural network training, i.e. the approximated probability distribution is
closer to the target that could be previously achieved. As a consequence, the
variational free energy reaches a value closer to its theoretical expectation
and, if applied in a Markov Chain Monte Carlo algorithm, the resulting
autocorrelation time is smaller. Finally, the replacement of a single neural
network by a hierarchy of smaller networks considerably reduces the memory
requirements.
Related papers
- Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability.
We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Fast, Distribution-free Predictive Inference for Neural Networks with
Coverage Guarantees [25.798057062452443]
This paper introduces a novel, computationally-efficient algorithm for predictive inference (PI)
It requires no distributional assumptions on the data and can be computed faster than existing bootstrap-type methods for neural networks.
arXiv Detail & Related papers (2023-06-11T04:03:58Z) - Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO
Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model.
We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically.
Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z) - Continuous time recurrent neural networks: overview and application to
forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations.
We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.