Meta-Learning Bidirectional Update Rules
- URL: http://arxiv.org/abs/2104.04657v1
- Date: Sat, 10 Apr 2021 00:56:35 GMT
- Title: Meta-Learning Bidirectional Update Rules
- Authors: Mark Sandler and Max Vladymyrov and Andrey Zhmoginov and Nolan Miller
and Andrew Jackson and Tom Madams and Blaise Aguera y Arcas
- Abstract summary: We introduce a new type of generalized neural network where neurons and synapses maintain multiple states.
We show that classical-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradient, with update rules derived from the chain rule.
- Score: 14.397000142362337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a new type of generalized neural network where
neurons and synapses maintain multiple states. We show that classical
gradient-based backpropagation in neural networks can be seen as a special case
of a two-state network where one state is used for activations and another for
gradients, with update rules derived from the chain rule. In our generalized
framework, networks have neither explicit notion of nor ever receive gradients.
The synapses and neurons are updated using a bidirectional Hebb-style update
rule parameterized by a shared low-dimensional "genome". We show that such
genomes can be meta-learned from scratch, using either conventional
optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting
update rules generalize to unseen tasks and train faster than gradient descent
based optimizers for several standard computer vision and synthetic tasks.
Related papers
- Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks [20.135740969953723]
Representing signals using coordinate networks dominates the area of inverse problems recently.
There exists an issue of spectral bias in coordinate networks, limiting the capacity to learn high-frequency components.
We find that, this pathological distribution could be improved using classical normalization techniques.
arXiv Detail & Related papers (2024-07-25T07:45:28Z) - Rule Based Learning with Dynamic (Graph) Neural Networks [0.8158530638728501]
We present rule based graph neural networks (RuleGNNs) that overcome some limitations of ordinary graph neural networks.
Our experiments show that the predictive performance of RuleGNNs is comparable to state-of-the-art graph classifiers.
We introduce new synthetic benchmark graph datasets to show how to integrate expert knowledge into RuleGNNs.
arXiv Detail & Related papers (2024-06-14T12:01:18Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Automatic Optimisation of Normalised Neural Networks [1.0334138809056097]
We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks.
Our approach first initialises the network and normalises the data with respect to the $ell2$-$ell2$ gain of the initialised network.
arXiv Detail & Related papers (2023-12-17T10:13:42Z) - TANGOS: Regularizing Tabular Neural Networks through Gradient
Orthogonalization and Specialization [69.80141512683254]
We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS)
TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions.
We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
arXiv Detail & Related papers (2023-03-09T18:57:13Z) - Artificial Neuronal Ensembles with Learned Context Dependent Gating [0.0]
We introduce Learned Context Dependent Gating (LXDG), a method to flexibly allocate and recall artificial neuronal ensembles'
Activities in the hidden layers of the network are modulated by gates, which are dynamically produced during training.
We demonstrate the ability of this method to alleviate catastrophic forgetting on continual learning benchmarks.
arXiv Detail & Related papers (2023-01-17T20:52:48Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.