The Backpropagation algorithm for a math student
- URL: http://arxiv.org/abs/2301.09977v3
- Date: Wed, 31 May 2023 23:37:17 GMT
- Title: The Backpropagation algorithm for a math student
- Authors: Saeed Damadi, Golnaz Moharrer, Mostafa Cham
- Abstract summary: A Deep Neural Network (DNN) is a composite function of vector-valued functions.
The gradient of the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters.
The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A Deep Neural Network (DNN) is a composite function of vector-valued
functions, and in order to train a DNN, it is necessary to calculate the
gradient of the loss function with respect to all parameters. This calculation
can be a non-trivial task because the loss function of a DNN is a composition
of several nonlinear functions, each with numerous parameters. The
Backpropagation (BP) algorithm leverages the composite structure of the DNN to
efficiently compute the gradient. As a result, the number of layers in the
network does not significantly impact the complexity of the calculation. The
objective of this paper is to express the gradient of the loss function in
terms of a matrix multiplication using the Jacobian operator. This can be
achieved by considering the total derivative of each layer with respect to its
parameters and expressing it as a Jacobian matrix. The gradient can then be
represented as the matrix product of these Jacobian matrices. This approach is
valid because the chain rule can be applied to a composition of vector-valued
functions, and the use of Jacobian matrices allows for the incorporation of
multiple inputs and outputs. By providing concise mathematical justifications,
the results can be made understandable and useful to a broad audience from
various disciplines.
Related papers
- Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - AnyLoss: Transforming Classification Metrics into Loss Functions [21.34290540936501]
evaluation metrics can be used to assess the performance of models in binary classification tasks.
Most metrics are derived from a confusion matrix in a non-differentiable form, making it difficult to generate a differentiable loss function that could directly optimize them.
We propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, textitAnyLoss, that is available in optimization processes.
arXiv Detail & Related papers (2024-05-23T16:14:16Z) - Accelerating Fractional PINNs using Operational Matrices of Derivative [0.24578723416255746]
This paper presents a novel operational matrix method to accelerate the training of fractional Physics-Informed Neural Networks (fPINNs)
Our approach involves a non-uniform discretization of the fractional Caputo operator, facilitating swift computation of fractional derivatives within Caputo-type fractional differential problems with $0alpha1$.
The effectiveness of our proposed method is validated across diverse differential equations, including Delay Differential Equations (DDEs) and Systems of Differential Algebraic Equations (DAEs)
arXiv Detail & Related papers (2024-01-25T11:00:19Z) - Combinatory Adjoints and Differentiation [0.0]
We develop a compositional approach for automatic and symbolic differentiation based on categorical constructions in functional analysis.
We show that both symbolic and automatic differentiation can be performed using a differential calculus for generating linear functions.
We also provide a calculus for symbolically computing the adjoint of a derivative without using matrices.
arXiv Detail & Related papers (2022-07-02T14:34:54Z) - Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z) - SPINE: Soft Piecewise Interpretable Neural Equations [0.0]
Fully connected networks are ubiquitous but uninterpretable.
This paper takes a novel approach to piecewise fits by using set operations on individual pieces(parts)
It can find a variety of applications where fully connected layers must be replaced by interpretable layers.
arXiv Detail & Related papers (2021-11-20T16:18:00Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Nonlinear Matrix Approximation with Radial Basis Function Components [0.06922389632860546]
We introduce and investigate matrix approximation by decomposition into a sum of radial basis function (RBF) components.
Our proposed nonlinear counterpart outperforms SVD by drastically reducing memory required to approximate a matrix with the same $L$-error for a wide range of matrix types.
arXiv Detail & Related papers (2021-06-03T17:37:41Z) - Automatic differentiation for Riemannian optimization on low-rank matrix
and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions.
One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - Automatic Differentiation in ROOT [62.997667081978825]
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program.
This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions.
arXiv Detail & Related papers (2020-04-09T09:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.