Carath\'eodory Sampling for Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2006.01819v2
- Date: Wed, 25 Nov 2020 17:13:40 GMT
- Title: Carath\'eodory Sampling for Stochastic Gradient Descent
- Authors: Francesco Cosentino, Harald Oberhauser, Alessandro Abate
- Abstract summary: We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
- Score: 79.55586575988292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many problems require to optimize empirical risk functions over large data
sets. Gradient descent methods that calculate the full gradient in every
descent step do not scale to such datasets. Various flavours of Stochastic
Gradient Descent (SGD) replace the expensive summation that computes the full
gradient by approximating it with a small sum over a randomly selected
subsample of the data set that in turn suffers from a high variance. We present
a different approach that is inspired by classical results of Tchakaloff and
Carath\'eodory about measure reduction. These results allow to replace an
empirical measure with another, carefully constructed probability measure that
has a much smaller support, but can preserve certain statistics such as the
expected gradient. To turn this into scalable algorithms we firstly, adaptively
select the descent steps where the measure reduction is carried out; secondly,
we combine this with Block Coordinate Descent so that measure reduction can be
done very cheaply. This makes the resulting methods scalable to
high-dimensional spaces. Finally, we provide an experimental validation and
comparison.
Related papers
- Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Preferential Subsampling for Stochastic Gradient Langevin Dynamics [3.158346511479111]
gradient MCMC offers an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data.
The resulting gradient estimator may exhibit a high variance and impact sampler performance.
We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
arXiv Detail & Related papers (2022-10-28T14:56:18Z) - Convergence of Batch Stochastic Gradient Descent Methods with
Approximate Gradients and/or Noisy Measurements: Theory and Computational
Results [0.9900482274337404]
We study convex optimization using a very general formulation called BSGD (Block Gradient Descent)
We establish conditions for BSGD to converge to the global minimum, based on approximation theory.
We show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge.
arXiv Detail & Related papers (2022-09-12T16:23:15Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Reparametrizing gradient descent [0.0]
We propose an optimization algorithm which we call norm-adapted gradient descent.
Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points.
arXiv Detail & Related papers (2020-10-09T20:22:29Z) - Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data [10.965065178451104]
We study distributed gradient descent (SGD) in the master-worker architecture under Byzantine attacks.
Our algorithm can tolerate up to $frac14$ fraction Byzantine workers.
arXiv Detail & Related papers (2020-05-16T04:15:27Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.