Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra
- URL: http://arxiv.org/abs/2409.14309v2
- Date: Sun, 17 Nov 2024 17:51:30 GMT
- Title: Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra
- Authors: Alex Lavaee,
- Abstract summary: We present Sketch 'n Solve, an open-source Python package that implements efficient randomized numerical linear algebra techniques.
We show that our implementation achieves up to 50x speedup over traditional LSQR while maintaining high accuracy, even for ill-conditioned matrices.
The package shows particular promise for applications in machine learning optimization, signal processing, and scientific computing.
- Score: 0.0
- License:
- Abstract: We present Sketch 'n Solve, an open-source Python package that implements efficient randomized numerical linear algebra (RandNLA) techniques for solving large-scale least squares problems. While sketch-and-solve algorithms have demonstrated theoretical promise, their practical adoption has been limited by the lack of robust, user-friendly implementations. Our package addresses this gap by providing an optimized implementation built on NumPy and SciPy, featuring both dense and sparse sketching operators with a clean API. Through extensive benchmarking, we demonstrate that our implementation achieves up to 50x speedup over traditional LSQR while maintaining high accuracy, even for ill-conditioned matrices. The package shows particular promise for applications in machine learning optimization, signal processing, and scientific computing.
Related papers
- Simulation-based inference with the Python Package sbijax [0.7499722271664147]
sbijax is a Python package that implements a wide variety of state-of-the-art methods in neural simulation-based inference.
The package provides functionality for approximate Bayesian computation, to compute model diagnostics, and to automatically estimate summary statistics.
arXiv Detail & Related papers (2024-09-28T18:47:13Z) - Implementing Recycling Methods for Linear Systems in Python with an
Application to Multiple Objective Optimization [6.131436685168423]
We seek to reduce the overall cost of computation when solving linear systems using common recycling methods.
In this work, we assessed the performance of recycling minimum residual (RMINRES) method along with a map between coefficient matrices.
arXiv Detail & Related papers (2024-02-25T00:15:30Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Complex-to-Real Sketches for Tensor Products with Applications to the
Polynomial Kernel [15.535749953841274]
Randomized sketches of a product of $p$ vectors follow a tradeoff between statistical efficiency and computational acceleration.
We propose a simple Complex-to-Real (CtR) modification of well-known sketches that replaces real random projections by complex ones.
We show that our method achieves state-of-the-art performance in terms of accuracy and speed compared to other randomized approximations from the literature.
arXiv Detail & Related papers (2022-02-04T09:15:43Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Speeding Up OPFython with Numba [0.0]
Optimum-Path Forest (OPF) has proven to be a state-of-the-art algorithm comparable to Logistic Regressors, Support Vector Machines.
Recently, its Python-based version, denoted as OPFython, has been proposed to provide a more friendly framework and a faster prototyping environment.
This paper proposes a simple yet highly efficient speed up using the Numba package, which accelerates Numpy-based calculations and attempts to increase the algorithm's overall performance.
arXiv Detail & Related papers (2021-06-22T14:39:32Z) - Learning the Positions in CountSketch [51.15935547615698]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work we propose the first learning algorithm that also optimize the locations of the non-zero entries.
We show this algorithm gives better accuracy for low rank approximation than previous work, and apply it to other problems such as $k$-means clustering for the first time.
arXiv Detail & Related papers (2020-07-20T05:06:29Z) - Picasso: A Sparse Learning Library for High Dimensional Data Analysis in
R and Python [77.33905890197269]
We describe a new library which implements a unified pathwise coordinate optimization for a variety of sparse learning problems.
The library is coded in R++ and has user-friendly sparse experiments.
arXiv Detail & Related papers (2020-06-27T02:39:24Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.