Infinite-dimensional Folded-in-time Deep Neural Networks
- URL: http://arxiv.org/abs/2101.02966v1
- Date: Fri, 8 Jan 2021 11:30:50 GMT
- Title: Infinite-dimensional Folded-in-time Deep Neural Networks
- Authors: Florian Stelzer (1 and 2), Serhiy Yanchuk (1) ((1) Institute of
Mathematics, Technische Universit\"at Berlin, Germany, (2) Department of
Mathematics, Humboldt-Universit\"at zu Berlin, Germany)
- Abstract summary: In this work, we present an infinite-dimensional generalization, which allows for a more rigorous mathematical analysis.
We also provide a functional backpropagation algorithm, which enables descent training of the weights.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The method recently introduced in arXiv:2011.10115 realizes a deep neural
network with just a single nonlinear element and delayed feedback. It is
applicable for the description of physically implemented neural networks. In
this work, we present an infinite-dimensional generalization, which allows for
a more rigorous mathematical analysis and a higher flexibility in choosing the
weight functions. Precisely speaking, the weights are described by Lebesgue
integrable functions instead of step functions. We also provide a functional
backpropagation algorithm, which enables gradient descent training of the
weights. In addition, with a slight modification, our concept realizes
recurrent neural networks.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Neural Functional Transformers [99.98750156515437]
This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers called neural functional Transformers (NFTs)
NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains.
We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant representations from the weights of implicit neural representations (INRs)
arXiv Detail & Related papers (2023-05-22T23:38:27Z) - Introduction to dynamical mean-field theory of generic random neural
networks [2.0711789781518752]
It is not easy for beginners to access the essence of this tool and the underlying physics.
We give a pedagogical introduction of this method in a particular example of generic random neural networks.
The numerical implementation of solving the integro-differential mean-field equations is also detailed.
arXiv Detail & Related papers (2023-05-15T09:01:40Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - A Sparse Coding Interpretation of Neural Networks and Theoretical
Implications [0.0]
Deep convolutional neural networks have achieved unprecedented performance in various computer vision tasks.
We propose a sparse coding interpretation of neural networks that have ReLU activation.
We derive a complete convolutional neural network without normalization and pooling.
arXiv Detail & Related papers (2021-08-14T21:54:47Z) - Optimal Approximation with Sparse Neural Networks and Applications [0.0]
We use deep sparsely connected neural networks to measure the complexity of a function class in $L(mathbb Rd)$.
We also introduce representation system - a countable collection of functions to guide neural networks.
We then analyse the complexity of a class called $beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.
arXiv Detail & Related papers (2021-08-14T05:14:13Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - Tensor-based framework for training flexible neural networks [9.176056742068813]
We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem.
The proposed algorithm can handle different bases decomposition.
The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
arXiv Detail & Related papers (2021-06-25T10:26:48Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Learning compositional functions via multiplicative weight updates [97.9457834009578]
We show that multiplicative weight updates satisfy a descent lemma tailored to compositional functions.
We show that Madam can train state of the art neural network architectures without learning rate tuning.
arXiv Detail & Related papers (2020-06-25T17:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.