Related papers: Tensor Train Multiplication

Related papers

Optimizing Tensor Network Partitioning using Simulated Annealing [0.0]
tensor networks have proven to be a valuable tool, for instance, in the classical simulation of (strongly correlated) quantum systems.<n>As the size of the systems increases, contracting larger tensor networks becomes computationally demanding.<n> Efficiently distributing the contraction task across multiple nodes is critical, as both computational and memory costs are highly sensitive to the chosen partitioning strategy.
arXiv Detail & Related papers (2025-07-28T09:43:01Z)
Neural Estimation for Scaling Entropic Multimarginal Optimal Transport [14.389645696715599]
We propose a new computational framework for entropic MOT, dubbed Neural Entropic MOT (NEMOT)<n>NEMOT employs neural networks trained using mini-batches, which transfers the computational complexity from the dataset size to the size of the mini-batch, leading to substantial gains.<n>In particular, orders-of-magnitude speedups are observed relative to the state-of-the-art, with a notable increase in the feasible number of samples and marginals.
arXiv Detail & Related papers (2025-05-31T14:10:27Z)
Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition [93.98343072306619]
We present Navier-GaLore, a novel method for efficient training of neural networks with higher-order tensor weights. Across various PDE tasks, Navier-GaLore achieves substantial memory savings, reducing memory usage by up to 75%.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Machine learning-driven conservative-to-primitive conversion in hybrid piecewise polytropic and tabulated equations of state [0.2999888908665658]
We present a novel machine learning (ML) method to accelerate conservative-to-primitive inversion in hydrodynamics simulations. We employ feedforward neural networks (NNC2PS and NNC2PL), trained in PyTorch, and optimized for GPU inference using NVIDIART. The mixed-precisionRT engine for NNC2PS inference speeds approximately 400 times faster than a traditional single-threaded implementation for a dataset size of 1,000,000 points.
arXiv Detail & Related papers (2024-12-10T19:00:01Z)
Transformer Neural Processes -- Kernel Regression [2.309018557701645]
We introduce the Transformer Neural Process - Kernel Regression (TNP-KR), a new architecture that incorporates a novel transformer block we call a Kernel Regression Block (BlockKR) In benchmarks spanning such tasks as meta-regression, Bayesian optimization, and image completion, we demonstrate that the full variant matches the performance of state-of-the-art methods while training faster and scaling two orders of magnitude higher in number of test points, and the fast variant nearly matches that performance while scaling to millions of both test and context points on consumer hardware.
arXiv Detail & Related papers (2024-11-19T13:40:49Z)
Quantum-Inspired Fluid Simulation of 2D Turbulence with GPU Acceleration [0.894484621897981]
We study an algorithm for solving the Navier-Stokes equations using velocity as matrix product states. Our adaptation speeds up simulations by up to 12.1 times. We find that the algorithm has a potential advantage over direct numerical simulations in the turbulent regime.
arXiv Detail & Related papers (2024-06-25T10:31:20Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Power of $\ell_1$-Norm Regularized Kaczmarz Algorithms for High-Order Tensor Recovery [8.812294191190896]
We propose novel Kaczmarz algorithms for recovering high-order tensors characterized by sparse and/or low-rank structures. A variety of numerical experiments on both synthetic and real-world datasets demonstrate the effectiveness and significant potential of the proposed methods.
arXiv Detail & Related papers (2024-05-14T02:06:53Z)
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$. We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z)
RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z)
Sublinear scaling in non-Markovian open quantum systems simulations [0.0]
We introduce a numerically exact algorithm to calculate process tensors. Our approach requires only $mathcalO(nlog n)$ singular value decompositions for environments with infinite memory.
arXiv Detail & Related papers (2023-04-11T15:40:33Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Latent Matrices for Tensor Network Decomposition and to Tensor Completion [8.301418317685906]
We propose a novel higher-order tensor decomposition model that decomposes the tensor into smaller ones and speeds up the computation of the algorithm. Three optimization algorithms, LMTN-PAM, LMTN-SVD and LMTN-AR, have been developed and applied to the tensor-completion task. Experimental results show that our LMTN-SVD algorithm is 3-6 times faster than the FCTN-PAM algorithm and only a 1.8 points accuracy drop.
arXiv Detail & Related papers (2022-10-07T08:19:50Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.