Related papers: A Simple and Efficient Tensor Calculus for Machine Learning

A Simple and Efficient Tensor Calculus for Machine Learning

URL: http://arxiv.org/abs/2010.03313v1
Date: Wed, 7 Oct 2020 10:18:56 GMT
Title: A Simple and Efficient Tensor Calculus for Machine Learning
Authors: S\"oren Laue, Matthias Mitterreiter, Joachim Giesen
Abstract summary: A key concern is the efficiency of evaluating the expressions and their derivatives that hinges on the representation of these expressions. An algorithm for computing higher order derivatives of tensor expressions like Jacobians or Hessians has been introduced that is a few orders of magnitude faster than previous state-of-the-art approaches. Here, we show that using Ricci notation is not necessary for an efficient tensor calculus and develop an equally efficient method for the simpler Einstein notation.
Score: 18.23338916563815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computing derivatives of tensor expressions, also known as tensor calculus, is a fundamental task in machine learning. A key concern is the efficiency of evaluating the expressions and their derivatives that hinges on the representation of these expressions. Recently, an algorithm for computing higher order derivatives of tensor expressions like Jacobians or Hessians has been introduced that is a few orders of magnitude faster than previous state-of-the-art approaches. Unfortunately, the approach is based on Ricci notation and hence cannot be incorporated into automatic differentiation frameworks from deep learning like TensorFlow, PyTorch, autograd, or JAX that use the simpler Einstein notation. This leaves two options, to either change the underlying tensor representation in these frameworks or to develop a new, provably correct algorithm based on Einstein notation. Obviously, the first option is impractical. Hence, we pursue the second option. Here, we show that using Ricci notation is not necessary for an efficient tensor calculus and develop an equally efficient method for the simpler Einstein notation. It turns out that turning to Einstein notation enables further improvements that lead to even better efficiency. The methods that are described in this paper have been implemented in the online tool www.MatrixCalculus.org for computing derivatives of matrix and tensor expressions. An extended abstract of this paper appeared as "A Simple and Efficient Tensor Calculus", AAAI 2020.

Related papers

A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks [0.0]
$n$-TangentProp computes the exact derivative $dn/dxn f(x)$ in quasilinear, instead of exponential time. We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks.
arXiv Detail & Related papers (2024-12-12T22:57:28Z)
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers [16.046186753149]
Self-attention mechanism is the key to the success of transformers in recent Large Language Models (LLMs) We leverage the convolution-like structure of attention matrices to develop an efficient approximation method for attention using convolution matrices. We hope our new paradigm for accelerating attention computation in transformer models can help their application to longer contexts.
arXiv Detail & Related papers (2024-05-08T17:11:38Z)
An introduction to graphical tensor notation for mechanistic interpretability [0.0]
It's often easy to get confused about which operations are happening between tensors. The first half of this document introduces the notation and applies it to some decompositions. The second half applies it to some existing some foundational approaches for mechanistically understanding language models.
arXiv Detail & Related papers (2024-02-02T02:56:01Z)
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives [0.0]
We consider a gradient-based optimization method applied to a function $boldsymboltheta$. This framework encompasses many common use-cases, such as training neural networks by gradient descent.
arXiv Detail & Related papers (2023-12-06T20:24:05Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Monarch: Expressive Structured Matrices for Efficient and Accurate Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones. We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z)
Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA) A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z)
Fast Differentiable Matrix Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP) The other method is to use Matrix Pad'e Approximants (MPA)
arXiv Detail & Related papers (2022-01-21T12:18:06Z)
SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing. We identify that their limitations are rooted in keeping the softmax self-attention during approximations. For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z)
Automatic differentiation for Riemannian optimization on low-rank matrix and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z)
Tensor Completion via Tensor Networks with a Tucker Wrapper [28.83358353043287]
We propose to solve low-rank tensor completion (LRTC) via tensor networks with a Tucker wrapper. A two-level alternative least square method is then employed to update the unknown factors. Numerical simulations show that the proposed algorithm is comparable with state-of-the-art methods.
arXiv Detail & Related papers (2020-10-29T17:54:01Z)
Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA) TRA is a set-based algebra based on the relational algebra. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.