Extraction Propagation
- URL: http://arxiv.org/abs/2402.15883v4
- Date: Mon, 09 Dec 2024 16:41:37 GMT
- Title: Extraction Propagation
- Authors: Stephen Pasteris, Chris Hicks, Vasilios Mavroudis,
- Abstract summary: We present an alternative architecture composed of many small neural networks that interact with one another.
Instead of propagating gradients back through the architecture we propagate vector-valued messages computed via forward passes.
- Score: 4.368185344922342
- License:
- Abstract: Running backpropagation end to end on large neural networks is fraught with difficulties like vanishing gradients and degradation. In this paper we present an alternative architecture composed of many small neural networks that interact with one another. Instead of propagating gradients back through the architecture we propagate vector-valued messages computed via forward passes, which are then used to update the parameters. Currently the performance is conjectured as we are yet to implement the architecture. However, we do back it up with some theory. A previous version of this paper was entitled "Fusion encoder networks" and detailed a slightly different architecture.
Related papers
- How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought.
Exploiting this structure can significantly improve gradient-free optimization schemes.
We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z) - Make Deep Networks Shallow Again [6.647569337929869]
A breakthrough has been achieved by the concept of residual connections.
A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion.
In other words, a sequential deep architecture is substituted by a parallel shallow one.
arXiv Detail & Related papers (2023-09-15T14:18:21Z) - Centered Self-Attention Layers [89.21791761168032]
The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied.
We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers.
We present a correction term to the aggregating operator of these mechanisms.
arXiv Detail & Related papers (2023-06-02T15:19:08Z) - Automatic Gradient Descent: Deep Learning without Hyperparameters [35.350274248478804]
The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology.
Paper builds a new framework for deriving objective functions: gradient idea is to transform a Bregman divergence to account for the non gradient structure of neural architecture.
arXiv Detail & Related papers (2023-04-11T12:45:52Z) - Projective Manifold Gradient Layer for Deep Rotation Regression [49.85464297105456]
Regressing rotations on SO(3) manifold using deep neural networks is an important yet unsolved problem.
We propose a manifold-aware gradient that directly backpropagates into deep network weights.
arXiv Detail & Related papers (2021-10-22T08:34:15Z) - On the Implicit Biases of Architecture & Gradient Descent [46.34988166338264]
This paper finds that while typical networks that fit the training data already generalise fairly well, gradient descent can further improve generalisation by selecting networks with a large margin.
New technical tools suggest a nuanced portrait of generalisation involving both the implicit biases of architecture and gradient descent.
arXiv Detail & Related papers (2021-10-08T17:36:37Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Have convolutions already made recurrence obsolete for unconstrained
handwritten text recognition ? [3.0969191504482247]
Unconstrained handwritten text recognition remains an important challenge for deep neural networks.
recurrent networks and Long Short-Term Memory networks have achieved state-of-the-art performance in this field.
We propose an experimental study regarding different architectures on an offline handwriting recognition task using the RIMES dataset.
arXiv Detail & Related papers (2020-12-09T10:15:24Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Using Graph Neural Networks to Reconstruct Ancient Documents [2.4366811507669124]
We present a solution based on a Graph Neural Network, using pairwise patch information to assign labels to edges.
This network classifies the relationship between a source and a target patch as being one of Up, Down, Left, Right or None.
We show that our model is not only able to provide correct classifications at the edge-level, but also to generate partial or full reconstruction graphs from a set of patches.
arXiv Detail & Related papers (2020-11-13T18:36:36Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.