Streaming Coresets for Symmetric Tensor Factorization
- URL: http://arxiv.org/abs/2006.01225v2
- Date: Mon, 13 Jul 2020 17:49:46 GMT
- Title: Streaming Coresets for Symmetric Tensor Factorization
- Authors: Rachit Chhaya, Jayesh Choudhari, Anirban Dasgupta, Supratim Shit
- Abstract summary: We show how to factorize tensors efficiently in the streaming setting.
We introduce two novel algorithmic techniques: online filtering and kernelization.
We show applications of our algorithms in learning single topic modeling.
- Score: 9.181791777532608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Factorizing tensors has recently become an important optimization module in a
number of machine learning pipelines, especially in latent variable models. We
show how to do this efficiently in the streaming setting. Given a set of $n$
vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear
number of these vectors as coreset, while guaranteeing that the CP
decomposition of the $p$-moment tensor of the coreset approximates the
corresponding decomposition of the $p$-moment tensor computed from the full
data. We introduce two novel algorithmic techniques: online filtering and
kernelization. Using these two, we present six algorithms that achieve
different tradeoffs of coreset size, update time and working space, beating or
matching various state of the art algorithms. In the case of matrices
($2$-ordered tensor), our online row sampling algorithm guarantees $(1 \pm
\epsilon)$ relative error spectral approximation. We show applications of our
algorithms in learning single topic modeling.
Related papers
- Fast and Simple Spectral Clustering in Theory and Practice [7.070726553564701]
Spectral clustering is a popular and effective algorithm designed to find $k$ clusters in a graph $G$.
We present a simple spectral clustering algorithm based on a vertices embedding with $O(log(k))$ computed by the power method.
We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy.
arXiv Detail & Related papers (2023-10-17T02:31:57Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor
Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions.
For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time.
In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Cost-efficient Gaussian Tensor Network Embeddings for Tensor-structured
Inputs [2.737640280995564]
We use network embeddings to perform dimensionality reduction of tensor network structured inputs $x$.
We provide an algorithm to efficiently sketch input data using such embeddings.
We show optimality of an existing algorithm for train rounding.
arXiv Detail & Related papers (2022-05-26T05:27:31Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Reconstruction Algorithms for Low-Rank Tensors and Depth-3 Multilinear
Circuits [4.129484350382891]
We give new and efficient black-box reconstruction algorithms for some classes of depth$3$ arithmetic circuits.
Our algorithm works over all fields characteristic 0 or large enough characteristic.
arXiv Detail & Related papers (2021-05-04T20:45:07Z) - Fast Tensor Disentangling Algorithm [0.0]
We introduce an approximate, fast, and simple algorithm to optimize disentangling unitary tensors.
Our algorithm is faster than previous iterative algorithms and often results in a residual entanglement entropy that is within 10 to 40% of the minimum.
arXiv Detail & Related papers (2021-04-16T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.