A theory of learning with constrained weight-distribution
- URL: http://arxiv.org/abs/2206.08933v1
- Date: Tue, 14 Jun 2022 00:43:34 GMT
- Title: A theory of learning with constrained weight-distribution
- Authors: Weishun Zhong, Ben Sorscher, Daniel D Lee, Haim Sompolinsky
- Abstract summary: We develop a statistical mechanical theory of learning in neural networks that incorporates structural information as constraints.
We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions.
Our theory and algorithm provide novel strategies for incorporating prior knowledge about weights into learning, and reveal a powerful connection between structure and function in neural networks.
- Score: 17.492950552276067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central question in computational neuroscience is how structure determines
function in neural networks. The emerging high-quality large-scale connectomic
datasets raise the question of what general functional principles can be
gleaned from structural information such as the distribution of
excitatory/inhibitory synapse types and the distribution of synaptic weights.
Motivated by this question, we developed a statistical mechanical theory of
learning in neural networks that incorporates structural information as
constraints. We derived an analytical solution for the memory capacity of the
perceptron, a basic feedforward model of supervised learning, with constraint
on the distribution of its weights. Our theory predicts that the reduction in
capacity due to the constrained weight-distribution is related to the
Wasserstein distance between the imposed distribution and that of the standard
normal distribution. To test the theoretical predictions, we use optimal
transport theory and information geometry to develop an SGD-based algorithm to
find weights that simultaneously learn the input-output task and satisfy the
distribution constraint. We show that training in our algorithm can be
interpreted as geodesic flows in the Wasserstein space of probability
distributions. We further developed a statistical mechanical theory for
teacher-student perceptron rule learning and ask for the best way for the
student to incorporate prior knowledge of the rule. Our theory shows that it is
beneficial for the learner to adopt different prior weight distributions during
learning, and shows that distribution-constrained learning outperforms
unconstrained and sign-constrained learning. Our theory and algorithm provide
novel strategies for incorporating prior knowledge about weights into learning,
and reveal a powerful connection between structure and function in neural
networks.
Related papers
- Learning Theory of Distribution Regression with Neural Networks [6.961253535504979]
We establish an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN)
In contrast to the classical regression methods, the input variables of distribution regression are probability measures.
arXiv Detail & Related papers (2023-07-07T09:49:11Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Statistical mechanics of continual learning: variational principle and
mean-field potential [1.559929646151698]
We focus on continual learning in single-layered and multi-layered neural networks of binary weights.
A variational Bayesian learning setting is proposed, where the neural networks are trained in a field-space.
Weight uncertainty is naturally incorporated, and modulates synaptic resources among tasks.
Our proposed frameworks also connect to elastic weight consolidation, weight-uncertainty learning, and neuroscience inspired metaplasticity.
arXiv Detail & Related papers (2022-12-06T09:32:45Z) - Statistical Physics of Deep Neural Networks: Initialization toward
Optimal Channels [6.144858413112823]
In deep learning, neural networks serve as noisy channels between input data and its representation.
We study a frequently overlooked possibility that neural networks can be intrinsic toward optimal channels.
arXiv Detail & Related papers (2022-12-04T05:13:01Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Credit Assignment in Neural Networks through Deep Feedback Control [59.14935871979047]
Deep Feedback Control (DFC) is a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment.
The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of connectivity patterns.
To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing.
arXiv Detail & Related papers (2021-06-15T05:30:17Z) - Theory-guided hard constraint projection (HCP): a knowledge-based
data-driven scientific machine learning method [7.778724782015986]
This study proposes theory-guided hard constraint projection (HCP)
This model converts physical constraints, such as governing equations, into a form that is easy to handle through discretization.
The performance of the theory-guided HCP is verified by experiments based on the heterogeneous subsurface flow problem.
arXiv Detail & Related papers (2020-12-11T06:17:43Z) - Geometry Perspective Of Estimating Learning Capability Of Neural
Networks [0.0]
The paper considers a broad class of neural networks with generalized architecture performing simple least square regression with gradient descent (SGD)
The relationship between the generalization capability with the stability of the neural network has also been discussed.
By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.
arXiv Detail & Related papers (2020-11-03T12:03:19Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.