Learning the Positions in CountSketch
- URL: http://arxiv.org/abs/2306.06611v2
- Date: Thu, 11 Apr 2024 00:31:28 GMT
- Title: Learning the Positions in CountSketch
- Authors: Yi Li, Honghao Lin, Simin Liu, Ali Vakilian, David P. Woodruff,
- Abstract summary: We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
- Score: 49.57951567374372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem, e.g., low-rank approximation and regression. In the learning-based sketching paradigm proposed by~\cite{indyk2019learning}, the sketch matrix is found by choosing a random sparse matrix, e.g., CountSketch, and then the values of its non-zero entries are updated by running gradient descent on a training data set. Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries. Our first proposed algorithm is based on a greedy algorithm. However, one drawback of the greedy algorithm is its slower training time. We fix this issue and propose approaches for learning a sketching matrix for both low-rank approximation and Hessian approximation for second order optimization. The latter is helpful for a range of constrained optimization problems, such as LASSO and matrix estimation with a nuclear norm constraint. Both approaches achieve good accuracy with a fast running time. Moreover, our experiments suggest that our algorithm can still reduce the error significantly even if we only have a very limited number of training matrices.
Related papers
- Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization [32.939120407900035]
Our algorithms feature a simple update rule that requires solving only one linear system per iteration.
We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.
arXiv Detail & Related papers (2024-06-04T06:56:41Z) - AdaSub: Stochastic Optimization Using Second-Order Information in
Low-Dimensional Subspaces [0.0]
We introduce AdaSub, a search algorithm that computes a search direction based on second-order information in a low-dimensional subspace.
Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses.
Our preliminary numerical results demonstrate that AdaSub surpasses popular iterations in terms of time and number of iterations required to reach a given accuracy.
arXiv Detail & Related papers (2023-10-30T22:24:23Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Learning Sparsity and Randomness for Data-driven Low Rank Approximation [0.0]
Learning-based low rank approximation algorithms can significantly improve the performance of randomized low rank approximation with sketch matrix.
We introduce two new methods Learning Sparsity and Learning Randomness which try to learn a better sparsity patterns and add randomness to the value of sketch matrix.
arXiv Detail & Related papers (2022-12-15T23:12:53Z) - Learning Sparse Graphs via Majorization-Minimization for Smooth Node
Signals [8.140698535149042]
We propose an algorithm for learning a sparse weighted graph by estimating its adjacency matrix.
We show that the proposed algorithm converges faster, in terms of the average number of iterations, than several existing methods in the literature.
arXiv Detail & Related papers (2022-02-06T17:06:13Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Learning the Positions in CountSketch [51.15935547615698]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work we propose the first learning algorithm that also optimize the locations of the non-zero entries.
We show this algorithm gives better accuracy for low rank approximation than previous work, and apply it to other problems such as $k$-means clustering for the first time.
arXiv Detail & Related papers (2020-07-20T05:06:29Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.