Related papers: Towards a theory of machine learning

Towards a theory of machine learning

URL: http://arxiv.org/abs/2004.09280v4
Date: Fri, 12 Feb 2021 17:41:30 GMT
Title: Towards a theory of machine learning
Authors: Vitaly Vanchurin
Abstract summary: We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function. We argue that the loss function can be imposed either on the boundary (i.e. input and/or output neurons) or in the bulk (i.e. hidden neurons) for both supervised and unsupervised systems. We apply the principle of maximum entropy to derive a canonical ensemble of the state vectors subject to a constraint imposed on the bulk loss function by a Lagrange multiplier (or an inverse temperature parameter). We show that in an equilibrium the canonical partition function must be a product of two factors: a function of the temperature and a function of the bias vector and weight matrix. Consequently, the total Shannon entropy consists of two terms which represent respectively a thermodynamic entropy and a complexity of the neural network. We derive the first and second laws of learning: during learning the total entropy must decrease until the system reaches an equilibrium (i.e. the second law), and the increment in the loss function must be proportional to the increment in the thermodynamic entropy plus the increment in the complexity (i.e. the first law). We calculate the entropy destruction to show that the efficiency of learning is given by the Laplacian of the total free energy which is to be maximized in an optimal neural architecture, and explain why the optimization condition is better satisfied in a deep network with a large number of hidden layers. The key properties of the model are verified numerically by training a supervised feedforward neural network using the method of stochastic gradient descent. We also discuss a possibility that the entire universe on its most fundamental level is a neural network.

Related papers

High-entropy Advantage in Neural Networks' Generalizability [7.193952396909214]
One of the central challenges in modern machine learning is understanding how neural networks generalize knowledge learned from training data to unseen test data. Here we introduce the concept of Boltzmann entropy into neural networks as hypothetical molecular systems where weights and biases are atomic coordinates, and the loss function is the potential energy. By employing molecular simulation algorithms, we compute entropy landscapes as functions of both training loss and test accuracy (or test loss) on networks with up to 1 million parameters, across four distinct machine learning tasks.
arXiv Detail & Related papers (2025-03-17T13:16:25Z)
A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff [57.25901375384457]
We propose a nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions. In particular, it doens't require the boundness of loss function, as commonly assumed in the literature. We show the near minimax optimality of our theory for multilayer ReLU networks for regression problems.
arXiv Detail & Related papers (2025-03-03T23:34:12Z)
Designing a Linearized Potential Function in Neural Network Optimization Using Csiszár Type of Tsallis Entropy [0.0]
In this paper, we establish a framework that utilizes a linearized potential function via Csisz'ar type of Tsallis entropy. We show that our new framework enable us to derive an exponential convergence result.
arXiv Detail & Related papers (2024-11-06T02:12:41Z)
Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Machine learning one-dimensional spinless trapped fermionic systems with neural-network quantum states [1.6606527887256322]
We compute the ground-state properties of fully polarized, trapped, one-dimensional fermionic systems interacting through a gaussian potential. We use an antisymmetric artificial neural network, or neural quantum state, as an ansatz for the wavefunction. We find very different ground states depending on the sign of the interaction.
arXiv Detail & Related papers (2023-04-10T17:36:52Z)
Correlation between entropy and generalizability in a neural network [9.223853439465582]
We use Wang-Landau Mote Carlo algorithm to calculate the entropy at a given test accuracy. Our results show that entropical forces help generalizability.
arXiv Detail & Related papers (2022-07-05T12:28:13Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Aspects of Pseudo Entropy in Field Theories [0.0]
We numerically analyze a class of free scalar field theories and the XY spin model. This reveals the basic properties of pseudo entropy in many-body systems. We find that the non-positivity of the difference can be violated only if the initial and final states belong to different quantum phases.
arXiv Detail & Related papers (2021-06-06T13:25:35Z)
Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria [121.36609493711292]
We study the application of iterative first-order methods to the problem of computing equilibria of large-scale two-player extensive-form games. By instantiating first-order methods with our regularizers, we develop the first accelerated first-order methods for computing correlated equilibra and ex-ante coordinated team equilibria.
arXiv Detail & Related papers (2021-05-27T06:10:24Z)
Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy. Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z)
Pseudo Entropy in Free Quantum Field Theories [0.0]
We conjecture two novel properties of Pseudo entropy which we conjecture to be universal in field theories. Our numerical results imply that pseudo entropy can play a role as a new quantum order parameter.
arXiv Detail & Related papers (2020-11-19T04:25:18Z)
Variational Monte Carlo calculations of $\mathbf{A\leq 4}$ nuclei with an artificial neural-network correlator ansatz [62.997667081978825]
We introduce a neural-network quantum state ansatz to model the ground-state wave function of light nuclei. We compute the binding energies and point-nucleon densities of $Aleq 4$ nuclei as emerging from a leading-order pionless effective field theory Hamiltonian.
arXiv Detail & Related papers (2020-07-28T14:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.