Related papers: Meta-Learning Bidirectional Update Rules

Meta-Learning Bidirectional Update Rules

URL: http://arxiv.org/abs/2104.04657v1
Date: Sat, 10 Apr 2021 00:56:35 GMT
Title: Meta-Learning Bidirectional Update Rules
Authors: Mark Sandler and Max Vladymyrov and Andrey Zhmoginov and Nolan Miller and Andrew Jackson and Tom Madams and Blaise Aguera y Arcas
Abstract summary: We introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradient, with update rules derived from the chain rule.
Score: 14.397000142362337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks.

Related papers

Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications. However, generalization properties of second-order methods are still being debated. We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z)
Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks [20.135740969953723]
Representing signals using coordinate networks dominates the area of inverse problems recently. There exists an issue of spectral bias in coordinate networks, limiting the capacity to learn high-frequency components. We find that, this pathological distribution could be improved using classical normalization techniques.
arXiv Detail & Related papers (2024-07-25T07:45:28Z)
Rule Based Learning with Dynamic (Graph) Neural Networks [0.8158530638728501]
We present rule based graph neural networks (RuleGNNs) that overcome some limitations of ordinary graph neural networks. Our experiments show that the predictive performance of RuleGNNs is comparable to state-of-the-art graph classifiers. We introduce new synthetic benchmark graph datasets to show how to integrate expert knowledge into RuleGNNs.
arXiv Detail & Related papers (2024-06-14T12:01:18Z)
Automatic Optimisation of Normalised Neural Networks [1.0334138809056097]
We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks. Our approach first initialises the network and normalises the data with respect to the $ell2$-$ell2$ gain of the initialised network.
arXiv Detail & Related papers (2023-12-17T10:13:42Z)
TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization [69.80141512683254]
We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS) TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions. We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
arXiv Detail & Related papers (2023-03-09T18:57:13Z)
Artificial Neuronal Ensembles with Learned Context Dependent Gating [0.0]
We introduce Learned Context Dependent Gating (LXDG), a method to flexibly allocate and recall artificial neuronal ensembles' Activities in the hidden layers of the network are modulated by gates, which are dynamically produced during training. We demonstrate the ability of this method to alleviate catastrophic forgetting on continual learning benchmarks.
arXiv Detail & Related papers (2023-01-17T20:52:48Z)
WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization. We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer. We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.