Efficient distributed representations beyond negative sampling
- URL: http://arxiv.org/abs/2303.17475v2
- Date: Mon, 30 Oct 2023 09:34:17 GMT
- Title: Efficient distributed representations beyond negative sampling
- Authors: Lorenzo Dall'Amico and Enrico Maria Belliardo
- Abstract summary: This article describes an efficient method to learn distributed representations, also known as embeddings.
We show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy.
- Score: 4.5687771576879594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article describes an efficient method to learn distributed
representations, also known as embeddings. This is accomplished minimizing an
objective function similar to the one introduced in the Word2Vec algorithm and
later adopted in several works. The optimization computational bottleneck is
the calculation of the softmax normalization constants for which a number of
operations scaling quadratically with the sample size is required. This
complexity is unsuited for large datasets and negative sampling is a popular
workaround, allowing one to obtain distributed representations in linear time
with respect to the sample size. Negative sampling consists, however, in a
change of the loss function and hence solves a different optimization problem
from the one originally proposed. Our contribution is to show that the sotfmax
normalization constants can be estimated in linear time, allowing us to design
an efficient optimization strategy to learn distributed representations. We
test our approximation on two popular applications related to word and node
embeddings. The results evidence competing performance in terms of accuracy
with respect to negative sampling with a remarkably lower computational time.
Related papers
- Approximating Metric Magnitude of Point Sets [4.522729058300309]
Metric magnitude is a measure of the "size" of point clouds with many desirable geometric properties.
It has been adapted to various mathematical contexts and recent work suggests that it can enhance machine learning and optimization algorithms.
In this paper, we study the magnitude problem, and show efficient ways of approximating it. We show that it can be cast as a convex optimization problem, but not as a submodular optimization.
The paper describes two new algorithms - an iterative approximation algorithm that converges fast and is accurate, and a subset selection method that makes the computation even faster.
arXiv Detail & Related papers (2024-09-06T17:15:28Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Fast Optimal Locally Private Mean Estimation via Random Projections [58.603579803010796]
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball.
We propose a new algorithmic framework, ProjUnit, for private mean estimation.
Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.
arXiv Detail & Related papers (2023-06-07T14:07:35Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - High-Dimensional Sparse Bayesian Learning without Covariance Matrices [66.60078365202867]
We introduce a new inference scheme that avoids explicit construction of the covariance matrix.
Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm.
On several simulations, our method scales better than existing approaches in computation time and memory.
arXiv Detail & Related papers (2022-02-25T16:35:26Z) - Fast Projected Newton-like Method for Precision Matrix Estimation under
Total Positivity [15.023842222803058]
Current algorithms are designed using the block coordinate descent method or the proximal point algorithm.
We propose a novel algorithm based on the two-metric projection method, incorporating a carefully designed search direction and variable partitioning scheme.
Experimental results on synthetic and real-world datasets demonstrate that our proposed algorithm provides a significant improvement in computational efficiency compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-03T14:39:10Z) - Algorithmic Solution for Systems of Linear Equations, in
$\mathcal{O}(mn)$ time [0.0]
We present a novel algorithm attaining excessively fast, the sought solution of linear systems of equations.
The execution time is very short compared with state-of-the-art methods.
The paper also comprises a theoretical proof for the algorithmic convergence.
arXiv Detail & Related papers (2021-04-26T13:40:31Z) - Parallel Scheduling Self-attention Mechanism: Generalization and
Optimization [0.76146285961466]
We propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver.
Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25% and 50% of the original computations are respectively achieved.
The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture.
arXiv Detail & Related papers (2020-12-02T12:04:16Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Estimating Multiple Precision Matrices with Cluster Fusion
Regularization [0.90238471756546]
We propose a penalized likelihood estimating multiple precision matrices from different classes.
Most existing methods either incorporate no information on relationships between the precision matrices, or require this information be a priori.
arXiv Detail & Related papers (2020-03-01T01:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.