Towards a theory of machine learning
- URL: http://arxiv.org/abs/2004.09280v4
- Date: Fri, 12 Feb 2021 17:41:30 GMT
- Title: Towards a theory of machine learning
- Authors: Vitaly Vanchurin
- Abstract summary: We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We define a neural network as a septuple consisting of (1) a state vector,
(2) an input projection, (3) an output projection, (4) a weight matrix, (5) a
bias vector, (6) an activation map and (7) a loss function. We argue that the
loss function can be imposed either on the boundary (i.e. input and/or output
neurons) or in the bulk (i.e. hidden neurons) for both supervised and
unsupervised systems. We apply the principle of maximum entropy to derive a
canonical ensemble of the state vectors subject to a constraint imposed on the
bulk loss function by a Lagrange multiplier (or an inverse temperature
parameter). We show that in an equilibrium the canonical partition function
must be a product of two factors: a function of the temperature and a function
of the bias vector and weight matrix. Consequently, the total Shannon entropy
consists of two terms which represent respectively a thermodynamic entropy and
a complexity of the neural network. We derive the first and second laws of
learning: during learning the total entropy must decrease until the system
reaches an equilibrium (i.e. the second law), and the increment in the loss
function must be proportional to the increment in the thermodynamic entropy
plus the increment in the complexity (i.e. the first law). We calculate the
entropy destruction to show that the efficiency of learning is given by the
Laplacian of the total free energy which is to be maximized in an optimal
neural architecture, and explain why the optimization condition is better
satisfied in a deep network with a large number of hidden layers. The key
properties of the model are verified numerically by training a supervised
feedforward neural network using the method of stochastic gradient descent. We
also discuss a possibility that the entire universe on its most fundamental
level is a neural network.
Related papers
- Designing a Linearized Potential Function in Neural Network Optimization Using Csiszár Type of Tsallis Entropy [0.0]
In this paper, we establish a framework that utilizes a linearized potential function via Csisz'ar type of Tsallis entropy.
We show that our new framework enable us to derive an exponential convergence result.
arXiv Detail & Related papers (2024-11-06T02:12:41Z) - Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions.
Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits.
token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Machine learning one-dimensional spinless trapped fermionic systems with
neural-network quantum states [1.6606527887256322]
We compute the ground-state properties of fully polarized, trapped, one-dimensional fermionic systems interacting through a gaussian potential.
We use an antisymmetric artificial neural network, or neural quantum state, as an ansatz for the wavefunction.
We find very different ground states depending on the sign of the interaction.
arXiv Detail & Related papers (2023-04-10T17:36:52Z) - Correlation between entropy and generalizability in a neural network [9.223853439465582]
We use Wang-Landau Mote Carlo algorithm to calculate the entropy at a given test accuracy.
Our results show that entropical forces help generalizability.
arXiv Detail & Related papers (2022-07-05T12:28:13Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Aspects of Pseudo Entropy in Field Theories [0.0]
We numerically analyze a class of free scalar field theories and the XY spin model.
This reveals the basic properties of pseudo entropy in many-body systems.
We find that the non-positivity of the difference can be violated only if the initial and final states belong to different quantum phases.
arXiv Detail & Related papers (2021-06-06T13:25:35Z) - Better Regularization for Sequential Decision Spaces: Fast Convergence
Rates for Nash, Correlated, and Team Equilibria [121.36609493711292]
We study the application of iterative first-order methods to the problem of computing equilibria of large-scale two-player extensive-form games.
By instantiating first-order methods with our regularizers, we develop the first accelerated first-order methods for computing correlated equilibra and ex-ante coordinated team equilibria.
arXiv Detail & Related papers (2021-05-27T06:10:24Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - Pseudo Entropy in Free Quantum Field Theories [0.0]
We conjecture two novel properties of Pseudo entropy which we conjecture to be universal in field theories.
Our numerical results imply that pseudo entropy can play a role as a new quantum order parameter.
arXiv Detail & Related papers (2020-11-19T04:25:18Z) - Variational Monte Carlo calculations of $\mathbf{A\leq 4}$ nuclei with
an artificial neural-network correlator ansatz [62.997667081978825]
We introduce a neural-network quantum state ansatz to model the ground-state wave function of light nuclei.
We compute the binding energies and point-nucleon densities of $Aleq 4$ nuclei as emerging from a leading-order pionless effective field theory Hamiltonian.
arXiv Detail & Related papers (2020-07-28T14:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.