Related papers: a-Tucker: Input-Adaptive and Matricization-Free Tucker Decomposition for Dense Tensors on CPUs and GPUs

a-Tucker: Input-Adaptive and Matricization-Free Tucker Decomposition for Dense Tensors on CPUs and GPUs

URL: http://arxiv.org/abs/2010.10131v1
Date: Tue, 20 Oct 2020 08:52:14 GMT
Title: a-Tucker: Input-Adaptive and Matricization-Free Tucker Decomposition for Dense Tensors on CPUs and GPUs
Authors: Min Li and Chuanfu Xiao and Chao Yang
Abstract summary: A-Tucker is a new framework for input-adaptive and matricization-free Tucker decomposition of dense tensors. A machine-learning adaptive solver selector is applied to automatically cope with the variations of both the input data and the hardware.
Score: 6.308492837096872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tucker decomposition is one of the most popular models for analyzing and compressing large-scale tensorial data. Existing Tucker decomposition algorithms usually rely on a single solver to compute the factor matrices and core tensor, and are not flexible enough to adapt with the diversities of the input data and the hardware. Moreover, to exploit highly efficient GEMM kernels, most Tucker decomposition implementations make use of explicit matricizations, which could introduce extra costs in terms of data conversion and memory usage. In this paper, we present a-Tucker, a new framework for input-adaptive and matricization-free Tucker decomposition of dense tensors. A mode-wise flexible Tucker decomposition algorithm is proposed to enable the switch of different solvers for the factor matrices and core tensor, and a machine-learning adaptive solver selector is applied to automatically cope with the variations of both the input data and the hardware. To further improve the performance and enhance the memory efficiency, we implement a-Tucker in a fully matricization-free manner without any conversion between tensors and matrices. Experiments with a variety of synthetic and real-world tensors show that a-Tucker can substantially outperform existing works on both CPUs and GPUs.

Related papers

Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition [93.98343072306619]
We present Navier-GaLore, a novel method for efficient training of neural networks with higher-order tensor weights. Across various PDE tasks, Navier-GaLore achieves substantial memory savings, reducing memory usage by up to 75%.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Sparse Tensor PCA via Tensor Decomposition for Unsupervised Feature Selection [7.887782360541216]
We introduce Decomposition (TD) techniques into unsupervised feature selection (UFS) We use the orientation-dependent tensor-tensor product from sparse Singular Value Decomposition to solve the problem. The proposed tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components.
arXiv Detail & Related papers (2024-07-24T04:04:56Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
Tucker-O-Minus Decomposition for Multi-view Tensor Subspace Clustering [36.790637575875635]
We propose a new tensor decomposition called Tucker-O-Minus Decomposition (TOMD) for multi-view clustering. Numerical experiments on six benchmark data sets demonstrate the superiority of our proposed method in terms of F-score, precision, recall, normalized mutual information, adjusted rand index, and accuracy.
arXiv Detail & Related papers (2022-10-23T07:20:22Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers. We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z)
SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing. We identify that their limitations are rooted in keeping the softmax self-attention during approximations. For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z)
Fast Low-Rank Tensor Decomposition by Ridge Leverage Score Sampling [5.740578698172382]
We study Tucker decompositions and use tools from randomized numerical linear algebra called ridge leverage scores. We show how to use approximate ridge leverage scores to construct a sketched instance for any ridge regression problem. We demonstrate the effectiveness of our approximate ridge regressioni algorithm for large, low-rank Tucker decompositions on both synthetic and real-world data.
arXiv Detail & Related papers (2021-07-22T13:32:47Z)
Low-Rank and Sparse Enhanced Tucker Decomposition for Tensor Completion [3.498620439731324]
We introduce a unified low-rank and sparse enhanced Tucker decomposition model for tensor completion. Our model possesses a sparse regularization term to promote a sparse core tensor, which is beneficial for tensor data compression. It is remarkable that our model is able to deal with different types of real-world data sets, since it exploits the potential periodicity and inherent correlation properties appeared in tensors.
arXiv Detail & Related papers (2020-10-01T12:45:39Z)
Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA) TRA is a set-based algebra based on the relational algebra. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z)
Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods. We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.