Tensor-based framework for training flexible neural networks
- URL: http://arxiv.org/abs/2106.13542v1
- Date: Fri, 25 Jun 2021 10:26:48 GMT
- Title: Tensor-based framework for training flexible neural networks
- Authors: Yassine Zniyed, Konstantin Usevich, Sebastian Miron, David Brie
- Abstract summary: We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem.
The proposed algorithm can handle different bases decomposition.
The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
- Score: 9.176056742068813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Activation functions (AFs) are an important part of the design of neural
networks (NNs), and their choice plays a predominant role in the performance of
a NN. In this work, we are particularly interested in the estimation of
flexible activation functions using tensor-based solutions, where the AFs are
expressed as a weighted sum of predefined basis functions. To do so, we propose
a new learning algorithm which solves a constrained coupled matrix-tensor
factorization (CMTF) problem. This technique fuses the first and zeroth order
information of the NN, where the first-order information is contained in a
Jacobian tensor, following a constrained canonical polyadic decomposition
(CPD). The proposed algorithm can handle different decomposition bases. The
goal of this method is to compress large pretrained NN models, by replacing
subnetworks, {\em i.e.,} one or multiple layers of the original network, by a
new flexible layer. The approach is applied to a pretrained convolutional
neural network (CNN) used for character classification.
Related papers
- Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Lifted Bregman Training of Neural Networks [28.03724379169264]
We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions.
This formulation is based on Bregman and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions.
We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding.
arXiv Detail & Related papers (2022-08-18T11:12:52Z) - A Sparse Coding Interpretation of Neural Networks and Theoretical
Implications [0.0]
Deep convolutional neural networks have achieved unprecedented performance in various computer vision tasks.
We propose a sparse coding interpretation of neural networks that have ReLU activation.
We derive a complete convolutional neural network without normalization and pooling.
arXiv Detail & Related papers (2021-08-14T21:54:47Z) - Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks [8.180947044673639]
We exploit a new dimension of elasticity along the input-output channels in a convolutional neural network (CNN)
A novel nuclear-norm rank minimization factorization (NRMF) approach is proposed to search for the reduced tensor ranks during training.
Experiments show the superiority of NRMF over the previous non-elastic variational Bayesian matrix factorization scheme.
arXiv Detail & Related papers (2021-05-10T09:26:47Z) - Connecting Weighted Automata, Tensor Networks and Recurrent Neural
Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks.
We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.