Related papers: Extraction Propagation

Related papers

Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought. Exploiting this structure can significantly improve gradient-free optimization schemes. We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z)
Make Deep Networks Shallow Again [6.647569337929869]
A breakthrough has been achieved by the concept of residual connections. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. In other words, a sequential deep architecture is substituted by a parallel shallow one.
arXiv Detail & Related papers (2023-09-15T14:18:21Z)
A max-affine spline approximation of neural networks using the Legendre transform of a convex-concave representation [0.3007949058551534]
This work presents a novel algorithm for transforming a neural network into a spline representation. The only constraint is that the function be bounded and possess a well-define second derivative. It can also be performed over the whole network rather than on each layer independently.
arXiv Detail & Related papers (2023-07-16T17:01:20Z)
Self-Expanding Neural Networks [24.812671965904727]
We introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network. We prove an upper bound on the rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems.
arXiv Detail & Related papers (2023-07-10T12:49:59Z)
Centered Self-Attention Layers [89.21791761168032]
The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied. We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers. We present a correction term to the aggregating operator of these mechanisms.
arXiv Detail & Related papers (2023-06-02T15:19:08Z)
Automatic Gradient Descent: Deep Learning without Hyperparameters [35.350274248478804]
The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Paper builds a new framework for deriving objective functions: gradient idea is to transform a Bregman divergence to account for the non gradient structure of neural architecture.
arXiv Detail & Related papers (2023-04-11T12:45:52Z)
Convolutional Learning on Multigraphs [153.20329791008095]
We develop convolutional information processing on multigraphs and introduce convolutional multigraph neural networks (MGNNs) To capture the complex dynamics of information diffusion within and across each of the multigraph's classes of edges, we formalize a convolutional signal processing model. We develop a multigraph learning architecture, including a sampling procedure to reduce computational complexity. The introduced architecture is applied towards optimal wireless resource allocation and a hate speech localization task, offering improved performance over traditional graph neural networks.
arXiv Detail & Related papers (2022-09-23T00:33:04Z)
Quiver neural networks [5.076419064097734]
We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures. Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows.
arXiv Detail & Related papers (2022-07-26T09:42:45Z)
Learning on Arbitrary Graph Topologies via Predictive Coding [38.761663028090204]
We show how predictive coding can be used to perform inference and learning on arbitrary graph topologies. We experimentally show how this formulation, called PC graphs, can be used to flexibly perform different tasks with the same network.
arXiv Detail & Related papers (2022-01-31T12:43:22Z)
Projective Manifold Gradient Layer for Deep Rotation Regression [49.85464297105456]
Regressing rotations on SO(3) manifold using deep neural networks is an important yet unsolved problem. We propose a manifold-aware gradient that directly backpropagates into deep network weights.
arXiv Detail & Related papers (2021-10-22T08:34:15Z)
On the Implicit Biases of Architecture & Gradient Descent [46.34988166338264]
This paper finds that while typical networks that fit the training data already generalise fairly well, gradient descent can further improve generalisation by selecting networks with a large margin. New technical tools suggest a nuanced portrait of generalisation involving both the implicit biases of architecture and gradient descent.
arXiv Detail & Related papers (2021-10-08T17:36:37Z)
Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics [0.5284812806199193]
We take inspiration from a popular framework in neuroscience: 'predictive coding' We show that implementing this strategy into two popular networks, VGG16 and EfficientNetB0, improves their robustness against various corruptions.
arXiv Detail & Related papers (2021-06-04T22:48:13Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition ? [3.0969191504482247]
Unconstrained handwritten text recognition remains an important challenge for deep neural networks. recurrent networks and Long Short-Term Memory networks have achieved state-of-the-art performance in this field. We propose an experimental study regarding different architectures on an offline handwriting recognition task using the RIMES dataset.
arXiv Detail & Related papers (2020-12-09T10:15:24Z)
Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition. Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z)
Using Graph Neural Networks to Reconstruct Ancient Documents [2.4366811507669124]
We present a solution based on a Graph Neural Network, using pairwise patch information to assign labels to edges. This network classifies the relationship between a source and a target patch as being one of Up, Down, Left, Right or None. We show that our model is not only able to provide correct classifications at the edge-level, but also to generate partial or full reconstruction graphs from a set of patches.
arXiv Detail & Related papers (2020-11-13T18:36:36Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
Learning compositional functions via multiplicative weight updates [97.9457834009578]
We show that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. We show that Madam can train state of the art neural network architectures without learning rate tuning.
arXiv Detail & Related papers (2020-06-25T17:05:19Z)
Neural Sparse Representation for Image Restoration [116.72107034624344]
Inspired by the robustness and efficiency of sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks.
arXiv Detail & Related papers (2020-06-08T05:15:17Z)
Geometrically Principled Connections in Graph Neural Networks [66.51286736506658]
We argue geometry should remain the primary driving force behind innovation in the emerging field of geometric deep learning. We relate graph neural networks to widely successful computer graphics and data approximation models: radial basis functions (RBFs) We introduce affine skip connections, a novel building block formed by combining a fully connected layer with any graph convolution operator.
arXiv Detail & Related papers (2020-04-06T13:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.