Related papers: SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors

SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors

URL: http://arxiv.org/abs/2010.04081v2
Date: Tue, 15 Dec 2020 10:26:56 GMT
Title: SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors
Authors: Ardavan Afshar, Kejing Yin, Sherry Yan, Cheng Qian, Joyce C. Ho, Haesun Park, Jimeng Sun
Abstract summary: We introduce SWIFT, which minimizes the Wasserstein distance that measures the distance between the input tensor and that of the reconstruction. SWIFT achieves up to 9.65% and 11.31% relative improvement over baselines for downstream prediction tasks.
Score: 42.154795547748165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing tensor factorization methods assume that the input tensor follows some specific distribution (i.e. Poisson, Bernoulli, and Gaussian), and solve the factorization by minimizing some empirical loss functions defined based on the corresponding distribution. However, it suffers from several drawbacks: 1) In reality, the underlying distributions are complicated and unknown, making it infeasible to be approximated by a simple distribution. 2) The correlation across dimensions of the input tensor is not well utilized, leading to sub-optimal performance. Although heuristics were proposed to incorporate such correlation as side information under Gaussian distribution, they can not easily be generalized to other distributions. Thus, a more principled way of utilizing the correlation in tensor factorization models is still an open challenge. Without assuming any explicit distribution, we formulate the tensor factorization as an optimal transport problem with Wasserstein distance, which can handle non-negative inputs. We introduce SWIFT, which minimizes the Wasserstein distance that measures the distance between the input tensor and that of the reconstruction. In particular, we define the N-th order tensor Wasserstein loss for the widely used tensor CP factorization and derive the optimization algorithm that minimizes it. By leveraging sparsity structure and different equivalent formulations for optimizing computational efficiency, SWIFT is as scalable as other well-known CP algorithms. Using the factor matrices as features, SWIFT achieves up to 9.65% and 11.31% relative improvement over baselines for downstream prediction tasks. Under the noisy conditions, SWIFT achieves up to 15% and 17% relative improvements over the best competitors for the prediction tasks.

Related papers

Score-Based Model for Low-Rank Tensor Recovery [49.158601255093416]
Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis.<n>Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions.<n>We propose a score-based model that eliminates the need for predefined structural or distributional assumptions.
arXiv Detail & Related papers (2025-06-27T15:05:37Z)
A Scalable Factorization Approach for High-Order Structured Tensor Recovery [30.876260188209105]
decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters.<n>A computationally memory-efficient approach to these problems is to optimize directly over factors using local algorithms.<n>We present a unified framework for factorization approach to solving various tensor decomposition problems.
arXiv Detail & Related papers (2025-06-19T05:07:07Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Sparse Tensor PCA via Tensor Decomposition for Unsupervised Feature Selection [7.887782360541216]
We introduce Decomposition (TD) techniques into unsupervised feature selection (UFS) We use the orientation-dependent tensor-tensor product from sparse Singular Value Decomposition to solve the problem. The proposed tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components.
arXiv Detail & Related papers (2024-07-24T04:04:56Z)
Training normalizing flows with computationally intensive target probability distributions [0.018416014644193065]
We propose an estimator for normalizing flows based on the REINFORCE algorithm. It is up to ten times faster in terms of the wall-clock time and requires up to $30%$ less memory.
arXiv Detail & Related papers (2023-08-25T10:40:46Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Error Analysis of Tensor-Train Cross Approximation [88.83467216606778]
We provide accuracy guarantees in terms of the entire tensor for both exact and noisy measurements. Results are verified by numerical experiments, and may have important implications for the usefulness of cross approximations for high-order tensors.
arXiv Detail & Related papers (2022-07-09T19:33:59Z)
Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes [16.79618682556073]
We propose a nonparametric factorization approach for sparsely observed tensors. We use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features.
arXiv Detail & Related papers (2021-10-19T16:17:26Z)
Augmented Tensor Decomposition with Stochastic Optimization [46.16865811396394]
Real-world tensor data are usually high-ordered and have large dimensions with millions or billions of entries. It is expensive to decompose the whole tensor with traditional algorithms. This paper proposes augmented tensor decomposition, which effectively incorporates data augmentations to boost downstream classification.
arXiv Detail & Related papers (2021-06-15T06:29:05Z)
MTC: Multiresolution Tensor Completion from Partial and Coarse Observations [49.931849672492305]
Existing completion formulation mostly relies on partial observations from a single tensor. We propose an efficient Multi-resolution Completion model (MTC) to solve the problem.
arXiv Detail & Related papers (2021-06-14T02:20:03Z)
Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements [30.395874385570007]
A fundamental task is to faithfully recover tensors from highly incomplete measurements. We develop an algorithm to directly recover the tensor factors in the Tucker decomposition. We show that it provably converges at a linear independent rate of the ground truth tensor for two canonical problems.
arXiv Detail & Related papers (2021-04-29T17:44:49Z)
Alternating linear scheme in a Bayesian framework for low-rank tensor approximation [5.833272638548154]
We find a low-rank representation for a given tensor by solving a Bayesian inference problem. We present an algorithm that performs the unscented transform in tensor train format.
arXiv Detail & Related papers (2020-12-21T10:15:30Z)
Linear Optimal Transport Embedding: Provable Wasserstein classification for certain rigid transformations and perturbations [79.23797234241471]
Discriminating between distributions is an important problem in a number of scientific fields. The Linear Optimal Transportation (LOT) embeds the space of distributions into an $L2$-space. We demonstrate the benefits of LOT on a number of distribution classification problems.
arXiv Detail & Related papers (2020-08-20T19:09:33Z)
Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality [92.35257908210316]
We study the problem of estimating a low-rank tensor given incomplete and corrupted observations. We find that it attains unimprovable rates $ell-2$ accuracy.
arXiv Detail & Related papers (2020-06-15T17:47:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.