Related papers: Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks

Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks

URL: http://arxiv.org/abs/2302.09699v1
Date: Mon, 20 Feb 2023 00:11:19 GMT
Title: Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks
Authors: Arun Ganesh, Daogao Liu, Sewoong Oh, Abhradeep Thakurta
Abstract summary: We introduce a new framework that utilizes two different kinds of oracles. We show that the exponential mechanism can achieve a good population risk bound, and provide a nearly matching lower bound.
Score: 34.79650838578354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of minimizing a non-convex objective while preserving the privacy of the examples in the training data. Building upon the previous variance-reduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more cost-effective, can estimate the gradient difference between two points. SpiderBoost uses the first kind periodically, once every few steps, while our framework proposes using the first oracle whenever the total drift has become large and relies on the second oracle otherwise. This new framework ensures the gradient estimations remain accurate all the time, resulting in improved rates for finding second-order stationary points. Moreover, we address a more challenging task of finding the global minima of a non-convex objective using the exponential mechanism. Our findings indicate that the regularized exponential mechanism can closely match previous empirical and population risk bounds, without requiring smoothness assumptions for algorithms with polynomial running time. Furthermore, by disregarding running time considerations, we show that the exponential mechanism can achieve a good population risk bound and provide a nearly matching lower bound.

Related papers

Dual Perspectives on Non-Contrastive Self-Supervised Learning [40.79287810164605]
Stop gradient and exponential moving average iterative procedures are commonly used to avoid representation collapse.<n>This presentation investigates these procedures from the dual theoretical viewpoints of optimization and dynamical systems.
arXiv Detail & Related papers (2025-06-18T07:46:51Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Stochastic Gradient Succeeds for Bandits [64.17904367852563]
We show that the emphstochastic gradient bandit algorithm converges to a emphglobally optimal policy at an $O (1/t)$ rate. Remarkably, global convergence of the gradient bandit algorithm has not been previously established.
arXiv Detail & Related papers (2024-02-27T06:05:01Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods [0.6554326244334866]
Sharpness is an assumption in continuous optimization that bounds the distance from minima by objective function suboptimality. Sharpness involves problem-specific constants that are typically unknown, and restart schemes typically reduce convergence rates. We consider the assumption of approximate sharpness, a generalization of sharpness that incorporates an unknown constant perturbation to the objective function error.
arXiv Detail & Related papers (2023-01-05T19:01:41Z)
Non-Convergence and Limit Cycles in the Adam optimizer [0.0]
We show that limit cycles of period 2 exist in batch mode for simple quadratic objective functions. We analyze the stability of these limit cycles and relate our analysis to other results where approximate convergence was shown.
arXiv Detail & Related papers (2022-10-05T07:44:33Z)
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm. We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
On Limited-Memory Subsampling Strategies for Bandits [0.0]
We show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. ( 2020) is optimal in one-dimensional exponential families. We also prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon. We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes.
arXiv Detail & Related papers (2021-06-21T09:11:22Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.