Related papers: On the Benefits of Accelerated Optimization in Robust and Private Estimation

On the Benefits of Accelerated Optimization in Robust and Private Estimation

URL: http://arxiv.org/abs/2506.03044v1
Date: Tue, 03 Jun 2025 16:26:30 GMT
Title: On the Benefits of Accelerated Optimization in Robust and Private Estimation
Authors: Laurentiu Andrei Marchis, Po-Ling Loh,
Abstract summary: We study the advantages of accelerated gradient methods, specifically based on the Frank-Wolfe method and projected descent.<n>For the Frank-Wolfe method, our technique is based on a tailored iteration learning rate and a uniform lower bound on the gradient of the $ell$-norm over the constraint set.<n>For accelerating projected descent, we use the popular variant based on Nesterov's momentum.
Score: 2.209921757303168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the advantages of accelerated gradient methods, specifically based on the Frank-Wolfe method and projected gradient descent, for privacy and heavy-tailed robustness. Our approaches are as follows: For the Frank-Wolfe method, our technique is based on a tailored learning rate and a uniform lower bound on the gradient of the $\ell_2$-norm over the constraint set. For accelerating projected gradient descent, we use the popular variant based on Nesterov's momentum, and we optimize our objective over $\mathbb{R}^p$. These accelerations reduce iteration complexity, translating into stronger statistical guarantees for empirical and population risk minimization. Our analysis covers three settings: non-random data, random model-free data, and parametric models (linear regression and generalized linear models). Methodologically, we approach both privacy and robustness based on noisy gradients. We ensure differential privacy via the Gaussian mechanism and advanced composition, and we achieve heavy-tailed robustness using a geometric median-of-means estimator, which also sharpens the dependency on the dimension of the covariates. Finally, we compare our rates to existing bounds and identify scenarios where our methods attain optimal convergence.

Related papers

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z)
Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z)
Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion. We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences. For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z)
Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization [17.79206971486723]
We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the capabilities of SGD. We approximate the diagonal elements of the Hessian from gradients, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the capabilities of SGD.
arXiv Detail & Related papers (2023-10-31T16:08:38Z)
Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD) We use gradient new outliers as norm clipping chains. We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Improved Differentially Private Regression via Gradient Boosting [38.09948758699131]
We give a new algorithm for private linear regression based on gradient boosting. In addition to a comprehensive set of experiments, we give theoretical insights to explain this behavior.
arXiv Detail & Related papers (2023-03-06T19:20:52Z)
Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization [1.4504054468850665]
In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees. We show that our methods exhibit very significant problems on realworld binary datasets. We also present a novel adaptive step-size approach for which we have computational guarantees.
arXiv Detail & Related papers (2022-08-30T00:08:37Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics. We prove that our model-based procedure converges in noisy gradient setting. This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.