Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing
- URL: http://arxiv.org/abs/2502.00882v1
- Date: Sun, 02 Feb 2025 19:23:46 GMT
- Title: Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing
- Authors: Gil Goldshlager, Jiang Hu, Lin Lin,
- Abstract summary: We analyze the convergence of the randomized block Kaczmarz method (RBK) when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $textit$ least-squares solution.
We resolve these issues by incorporating regularization into the RBK.
Numerical experiments, including examples arising from natural gradient optimization, suggest that the regularized algorithm, ReBlocK, outperforms minibatch gradient descent for realistic problems that exhibit fast singular value decay.
- Score: 2.542838926315103
- License:
- Abstract: Due to the ever growing amounts of data leveraged for machine learning and scientific computing, it is increasingly important to develop algorithms that sample only a small portion of the data at a time. In the case of linear least-squares, the randomized block Kaczmarz method (RBK) is an appealing example of such an algorithm, but its convergence is only understood under sampling distributions that require potentially prohibitively expensive preprocessing steps. To address this limitation, we analyze RBK when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $\textit{weighted}$ least-squares solution. Unfortunately, for general problems the condition number of the weight matrix and the variance of the iterates can become arbitrarily large. We resolve these issues by incorporating regularization into the RBK iterations. Numerical experiments, including examples arising from natural gradient optimization, suggest that the regularized algorithm, ReBlocK, outperforms minibatch stochastic gradient descent for realistic problems that exhibit fast singular value decay.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Learning Large Scale Sparse Models [6.428186644949941]
We consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions.
We propose to learn sparse models such as Lasso in an online manner where in each, only one randomly chosen sample is revealed to update a sparse gradient.
Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient.
arXiv Detail & Related papers (2023-01-26T06:29:49Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Rejection sampling from shape-constrained distributions in sublinear
time [14.18847457501901]
We study the query complexity of rejection sampling in a minimax framework for various classes of discrete distributions.
Our results provide new algorithms for sampling whose complexity scales sublinearly with the alphabet size.
arXiv Detail & Related papers (2021-05-29T01:00:42Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Smooth Deformation Field-based Mismatch Removal in Real-time [10.181846237133167]
Non-rigid deformation makes it difficult to remove mismatches because no parametric transformation can be found.
We propose an algorithm based on the re-weighting and 1-point RANSAC strategy (R1P-RNSC)
We show that the combination of the two algorithms has the best accuracy compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-16T18:20:25Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.