Related papers: A theory of learning with constrained weight-distribution

A theory of learning with constrained weight-distribution

URL: http://arxiv.org/abs/2206.08933v1
Date: Tue, 14 Jun 2022 00:43:34 GMT
Title: A theory of learning with constrained weight-distribution
Authors: Weishun Zhong, Ben Sorscher, Daniel D Lee, Haim Sompolinsky
Abstract summary: We develop a statistical mechanical theory of learning in neural networks that incorporates structural information as constraints. We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions. Our theory and algorithm provide novel strategies for incorporating prior knowledge about weights into learning, and reveal a powerful connection between structure and function in neural networks.
Score: 17.492950552276067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central question in computational neuroscience is how structure determines function in neural networks. The emerging high-quality large-scale connectomic datasets raise the question of what general functional principles can be gleaned from structural information such as the distribution of excitatory/inhibitory synapse types and the distribution of synaptic weights. Motivated by this question, we developed a statistical mechanical theory of learning in neural networks that incorporates structural information as constraints. We derived an analytical solution for the memory capacity of the perceptron, a basic feedforward model of supervised learning, with constraint on the distribution of its weights. Our theory predicts that the reduction in capacity due to the constrained weight-distribution is related to the Wasserstein distance between the imposed distribution and that of the standard normal distribution. To test the theoretical predictions, we use optimal transport theory and information geometry to develop an SGD-based algorithm to find weights that simultaneously learn the input-output task and satisfy the distribution constraint. We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions. We further developed a statistical mechanical theory for teacher-student perceptron rule learning and ask for the best way for the student to incorporate prior knowledge of the rule. Our theory shows that it is beneficial for the learner to adopt different prior weight distributions during learning, and shows that distribution-constrained learning outperforms unconstrained and sign-constrained learning. Our theory and algorithm provide novel strategies for incorporating prior knowledge about weights into learning, and reveal a powerful connection between structure and function in neural networks.

Related papers

The impact of allocation strategies in subset learning on the expressive power of neural networks [0.0]
We investigate how different allocations of a fixed number of learnable weights influence the capacity of neural networks. We establish conditions under which allocations have maximal or minimal expressive power in linear recurrent neural networks and linear multilayer feedforward networks. Our results emphasize the critical role of strategically distributing learnable weights across the network, showing that a more widespread allocation generally enhances the network's expressive power.
arXiv Detail & Related papers (2025-02-10T09:43:43Z)
Learning Theory of Distribution Regression with Neural Networks [6.961253535504979]
We establish an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN) In contrast to the classical regression methods, the input variables of distribution regression are probability measures.
arXiv Detail & Related papers (2023-07-07T09:49:11Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Statistical mechanics of continual learning: variational principle and mean-field potential [1.559929646151698]
We focus on continual learning in single-layered and multi-layered neural networks of binary weights. A variational Bayesian learning setting is proposed, where the neural networks are trained in a field-space. Weight uncertainty is naturally incorporated, and modulates synaptic resources among tasks. Our proposed frameworks also connect to elastic weight consolidation, weight-uncertainty learning, and neuroscience inspired metaplasticity.
arXiv Detail & Related papers (2022-12-06T09:32:45Z)
Statistical Physics of Deep Neural Networks: Initialization toward Optimal Channels [6.144858413112823]
In deep learning, neural networks serve as noisy channels between input data and its representation. We study a frequently overlooked possibility that neural networks can be intrinsic toward optimal channels.
arXiv Detail & Related papers (2022-12-04T05:13:01Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
Credit Assignment in Neural Networks through Deep Feedback Control [59.14935871979047]
Deep Feedback Control (DFC) is a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment. The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of connectivity patterns. To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing.
arXiv Detail & Related papers (2021-06-15T05:30:17Z)
Theory-guided hard constraint projection (HCP): a knowledge-based data-driven scientific machine learning method [7.778724782015986]
This study proposes theory-guided hard constraint projection (HCP) This model converts physical constraints, such as governing equations, into a form that is easy to handle through discretization. The performance of the theory-guided HCP is verified by experiments based on the heterogeneous subsurface flow problem.
arXiv Detail & Related papers (2020-12-11T06:17:43Z)
Geometry Perspective Of Estimating Learning Capability Of Neural Networks [0.0]
The paper considers a broad class of neural networks with generalized architecture performing simple least square regression with gradient descent (SGD) The relationship between the generalization capability with the stability of the neural network has also been discussed. By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.
arXiv Detail & Related papers (2020-11-03T12:03:19Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks. Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities. Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.