Related papers: Combinatory Adjoints and Differentiation

Combinatory Adjoints and Differentiation

URL: http://arxiv.org/abs/2207.00847v1
Date: Sat, 2 Jul 2022 14:34:54 GMT
Title: Combinatory Adjoints and Differentiation
Authors: Martin Elsman (University of Copenhagen), Fritz Henglein (University of Copenhagen), Robin Kaarsgaard (University of Edinburgh), Mikkel Kragh Mathiesen (University of Copenhagen), Robert Schenck (University of Copenhagen)
Abstract summary: We develop a compositional approach for automatic and symbolic differentiation based on categorical constructions in functional analysis. We show that both symbolic and automatic differentiation can be performed using a differential calculus for generating linear functions. We also provide a calculus for symbolically computing the adjoint of a derivative without using matrices.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop a compositional approach for automatic and symbolic differentiation based on categorical constructions in functional analysis where derivatives are linear functions on abstract vectors rather than being limited to scalars, vectors, matrices or tensors represented as multi-dimensional arrays. We show that both symbolic and automatic differentiation can be performed using a differential calculus for generating linear functions representing Fr\'echet derivatives based on rules for primitive, constant, linear and bilinear functions as well as their sequential and parallel composition. Linear functions are represented in a combinatory domain-specific language. Finally, we provide a calculus for symbolically computing the adjoint of a derivative without using matrices, which are too inefficient to use on high-dimensional spaces. The resulting symbolic representation of a derivative retains the data-parallel operations from the input program. The combination of combinatory differentiation and computing formal adjoints turns out to be behaviorally equivalent to reverse-mode automatic differentiation. In particular, it provides opportunities for optimizations where matrices are too inefficient to represent linear functions.

Related papers

Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models [76.52307406752556]
We derive a novel deterministic equivalence for the two-point function of a random resolvent. We give a unified derivation of the performance of a wide variety of high-dimensional trained linear models with gradient descent.
arXiv Detail & Related papers (2025-02-07T16:45:40Z)
Matrix Calculus (for Machine Learning and Beyond) [0.2647285178819813]
This course introduces the extension of differential calculus to functions on more general vector spaces. It emphasizes practical computational applications, such as large-scale optimization and machine learning.
arXiv Detail & Related papers (2025-01-07T18:38:35Z)
Entrywise application of non-linear functions on orthogonally invariant matrices [44.99833362998488]
We investigate how the entrywise application of a non-linear function to symmetric invariant random matrix ensembles alters the spectral distribution. We find that in all those cases a Gaussian equivalence principle holds, that is, the effect of the non-linear function is the same as taking a linear combination of the involved matrices and an additional independent GOE.
arXiv Detail & Related papers (2024-12-09T19:41:09Z)
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.
arXiv Detail & Related papers (2024-07-15T07:11:44Z)
Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices. We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data. We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z)
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA. By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z)
The Backpropagation algorithm for a math student [0.0]
A Deep Neural Network (DNN) is a composite function of vector-valued functions. The gradient of the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator.
arXiv Detail & Related papers (2023-01-22T08:45:30Z)
Decoupling multivariate functions using a nonparametric filtered tensor decomposition [0.29360071145551075]
Decoupling techniques aim at providing an alternative representation of the nonlinearity. The so-called decoupled form is often a more efficient parameterisation of the relationship while being highly structured, favouring interpretability. In this work two new algorithms, based on filtered tensor decompositions of first order derivative information are introduced.
arXiv Detail & Related papers (2022-05-23T09:34:17Z)
Learning Linearized Assignment Flows for Image Labeling [70.540936204654]
We introduce a novel algorithm for estimating optimal parameters of linearized assignment flows for image labeling. We show how to efficiently evaluate this formula using a Krylov subspace and a low-rank approximation.
arXiv Detail & Related papers (2021-08-02T13:38:09Z)
Automatic differentiation for Riemannian optimization on low-rank matrix and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z)
A graphical calculus for integration over random diagonal unitary matrices [1.5229257192293197]
We provide a graphical calculus for computing averages of tensor network diagrams. We exploit the order structure of the partially ordered set of uniform block permutations. A similar calculus is developed for random vectors consisting of independent uniform signs.
arXiv Detail & Related papers (2020-07-22T06:06:51Z)
Automatic Differentiation in ROOT [62.997667081978825]
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions.
arXiv Detail & Related papers (2020-04-09T09:18:50Z)
Invariant Feature Coding using Tensor Product Representation [75.62232699377877]
We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier. A novel feature model that explicitly consider group action is proposed for principal component analysis and k-means clustering.
arXiv Detail & Related papers (2019-06-05T07:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.