Learning compositional functions via multiplicative weight updates
- URL: http://arxiv.org/abs/2006.14560v2
- Date: Fri, 8 Jan 2021 17:34:41 GMT
- Title: Learning compositional functions via multiplicative weight updates
- Authors: Jeremy Bernstein, Jiawei Zhao, Markus Meister, Ming-Yu Liu, Anima
Anandkumar, Yisong Yue
- Abstract summary: We show that multiplicative weight updates satisfy a descent lemma tailored to compositional functions.
We show that Madam can train state of the art neural network architectures without learning rate tuning.
- Score: 97.9457834009578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compositionality is a basic structural feature of both biological and
artificial neural networks. Learning compositional functions via gradient
descent incurs well known problems like vanishing and exploding gradients,
making careful learning rate tuning essential for real-world applications. This
paper proves that multiplicative weight updates satisfy a descent lemma
tailored to compositional functions. Based on this lemma, we derive Madam -- a
multiplicative version of the Adam optimiser -- and show that it can train
state of the art neural network architectures without learning rate tuning. We
further show that Madam is easily adapted to train natively compressed neural
networks by representing their weights in a logarithmic number system. We
conclude by drawing connections between multiplicative weight updates and
recent findings about synapses in biology.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Low Tensor Rank Learning of Neural Dynamics [0.0]
We show that low-tensor-rank weights emerge naturally in RNNs trained to solve low-dimensional tasks.
Our findings provide insight on the evolution of population connectivity over learning in both biological and artificial neural networks.
arXiv Detail & Related papers (2023-08-22T17:08:47Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Deep Learning Meets Sparse Regularization: A Signal Processing
Perspective [17.12783792226575]
We present a mathematical framework that characterizes the functional properties of neural networks that are trained to fit to data.
Key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory.
This framework explains the effect of weight decay regularization in neural network training, the use of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems.
arXiv Detail & Related papers (2023-01-23T17:16:21Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Infinite-dimensional Folded-in-time Deep Neural Networks [0.0]
In this work, we present an infinite-dimensional generalization, which allows for a more rigorous mathematical analysis.
We also provide a functional backpropagation algorithm, which enables descent training of the weights.
arXiv Detail & Related papers (2021-01-08T11:30:50Z) - A multi-agent model for growing spiking neural networks [0.0]
This project has explored rules for growing the connections between the neurons in Spiking Neural Networks as a learning mechanism.
Results in a simulation environment showed that for a given set of parameters it is possible to reach topologies that reproduce the tested functions.
This project also opens the door to the usage of techniques like genetic algorithms for obtaining the best suited values for the model parameters.
arXiv Detail & Related papers (2020-09-21T15:11:29Z) - Graph Structure of Neural Networks [104.33754950606298]
We show how the graph structure of neural networks affect their predictive performance.
A "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance.
Top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks.
arXiv Detail & Related papers (2020-07-13T17:59:31Z) - Hcore-Init: Neural Network Initialization based on Graph Degeneracy [22.923756039561194]
We propose an adapted version of the k-core structure for the complete weighted multipartite graph extracted from a deep learning architecture.
As a multipartite graph is a combination of bipartite graphs, that are in turn the incidence graphs of hypergraphs, we design k-hypercore decomposition.
arXiv Detail & Related papers (2020-04-16T12:57:14Z) - Geometrically Principled Connections in Graph Neural Networks [66.51286736506658]
We argue geometry should remain the primary driving force behind innovation in the emerging field of geometric deep learning.
We relate graph neural networks to widely successful computer graphics and data approximation models: radial basis functions (RBFs)
We introduce affine skip connections, a novel building block formed by combining a fully connected layer with any graph convolution operator.
arXiv Detail & Related papers (2020-04-06T13:25:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.