Tensor Train Multiplication
- URL: http://arxiv.org/abs/2410.19747v2
- Date: Tue, 29 Oct 2024 11:35:57 GMT
- Title: Tensor Train Multiplication
- Authors: Alexios A Michailidis, Christian Fenton, Martin Kiffner,
- Abstract summary: The computational complexity and memory requirements of the TTM algorithm scale as $chi3$ and $chi2$, respectively.
This represents a significant improvement compared with the conventional approach.
The TTM algorithm paves the way towards GPU accelerated tensor network simulations of computational fluid dynamics problems with large bond dimensions.
- Score: 0.0
- License:
- Abstract: We present the Tensor Train Multiplication (TTM) algorithm for the elementwise multiplication of two tensor trains with bond dimension $\chi$. The computational complexity and memory requirements of the TTM algorithm scale as $\chi^3$ and $\chi^2$, respectively. This represents a significant improvement compared with the conventional approach, where the computational complexity scales as $\chi^4$ and memory requirements scale as $\chi^3$.We benchmark the TTM algorithm using flows obtained from artificial turbulence generation and numerically demonstrate its improved runtime and memory scaling compared with the conventional approach. The TTM algorithm paves the way towards GPU accelerated tensor network simulations of computational fluid dynamics problems with large bond dimensions due to its dramatic improvement in memory scaling.
Related papers
- Quantum-Inspired Fluid Simulation of 2D Turbulence with GPU Acceleration [0.894484621897981]
We study an algorithm for solving the Navier-Stokes equations using velocity as matrix product states.
Our adaptation speeds up simulations by up to 12.1 times.
We find that the algorithm has a potential advantage over direct numerical simulations in the turbulent regime.
arXiv Detail & Related papers (2024-06-25T10:31:20Z) - Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain.
We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance.
We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z) - Power of $\ell_1$-Norm Regularized Kaczmarz Algorithms for High-Order Tensor Recovery [8.812294191190896]
We propose novel Kaczmarz algorithms for recovering high-order tensors characterized by sparse and/or low-rank structures.
A variety of numerical experiments on both synthetic and real-world datasets demonstrate the effectiveness and significant potential of the proposed methods.
arXiv Detail & Related papers (2024-05-14T02:06:53Z) - TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length.
We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$.
We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Sublinear scaling in non-Markovian open quantum systems simulations [0.0]
We introduce a numerically exact algorithm to calculate process tensors.
Our approach requires only $mathcalO(nlog n)$ singular value decompositions for environments with infinite memory.
arXiv Detail & Related papers (2023-04-11T15:40:33Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Latent Matrices for Tensor Network Decomposition and to Tensor
Completion [8.301418317685906]
We propose a novel higher-order tensor decomposition model that decomposes the tensor into smaller ones and speeds up the computation of the algorithm.
Three optimization algorithms, LMTN-PAM, LMTN-SVD and LMTN-AR, have been developed and applied to the tensor-completion task.
Experimental results show that our LMTN-SVD algorithm is 3-6 times faster than the FCTN-PAM algorithm and only a 1.8 points accuracy drop.
arXiv Detail & Related papers (2022-10-07T08:19:50Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - FC2T2: The Fast Continuous Convolutional Taylor Transform with
Applications in Vision and Graphics [8.629912408966145]
We revisit the Taylor series expansion from a modern Machine Learning perspective.
We introduce the Fast Continuous Convolutional Taylor Transform (FC2T2), a variant of the Fast Multipole Method (FMM), that allows for the efficient approximation of low dimensional convolutional operators in continuous space.
arXiv Detail & Related papers (2021-10-29T22:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.