From Adaptive Query Release to Machine Unlearning
- URL: http://arxiv.org/abs/2307.11228v1
- Date: Thu, 20 Jul 2023 20:46:39 GMT
- Title: From Adaptive Query Release to Machine Unlearning
- Authors: Enayat Ullah, Raman Arora
- Abstract summary: We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms.
We show that unlearning in convex optimization (SCO) can be reduced to the above, yielding improved guarantees for the problem.
- Score: 34.25760971846068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We formalize the problem of machine unlearning as design of efficient
unlearning algorithms corresponding to learning algorithms which perform a
selection of adaptive queries from structured query classes. We give efficient
unlearning algorithms for linear and prefix-sum query classes. As applications,
we show that unlearning in many problems, in particular, stochastic convex
optimization (SCO), can be reduced to the above, yielding improved guarantees
for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$,
our results yield an unlearning algorithm with excess population risk of
$\tilde O\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$ with unlearning
query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining
Complexity})$, where $d$ is the model dimensionality and $n$ is the initial
number of samples. For non-smooth Lipschitz losses, we give an unlearning
algorithm with excess population risk $\tilde
O\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$ with the
same unlearning query (gradient) complexity. Furthermore, in the special case
of Generalized Linear Models (GLMs), such as those in linear and logistic
regression, we get dimension-independent rates of $\tilde
O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$ and $\tilde
O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$ for smooth Lipschitz
and non-smooth Lipschitz losses respectively. Finally, we give generalizations
of the above from one unlearning request to \textit{dynamic} streams consisting
of insertions and deletions.
Related papers
- Learning Multinomial Logits in $O(n \log n)$ time [56.23331174813387]
A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]=1,..., n$, each assigned a positive weight.<n>A query specifies an admissible subset -- called a slate -- and the model chooses one item from that slate with probability proportional to its weight.<n>This query model is also known as the Plackett-Luce model or conditional sampling oracle in the literature.
arXiv Detail & Related papers (2026-01-07T22:07:44Z) - Learning Partitions with Optimal Query and Round Complexities [16.815943270621638]
We consider the basic problem of learning an unknown partition of $n$ elements into at most $k$ sets.<n>Non-adaptive algorithms require $Theta(n2)$ queries, while adaptive algorithms require $Theta(nk)$ queries.<n>Our algorithm only needs $O(log log n)$ rounds to attain the optimal $O(nk)$ query complexity.
arXiv Detail & Related papers (2025-05-08T07:27:29Z) - The Sample Complexity Of ERMs In Stochastic Convex Optimization [13.896417716930687]
We show that in fact $tildeO(fracdepsilon+frac1epsilon2)$ data points are also sufficient.
We further generalize the result and show that a similar upper bound holds for all convex bodies.
arXiv Detail & Related papers (2023-11-09T14:29:25Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Memory-Constrained Algorithms for Convex Optimization via Recursive
Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$.
In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z) - Online Lewis Weight Sampling [62.38157566916501]
Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community.
Several works have extended this important primitive to other settings, including the online coreset, sliding window, and adversarial streaming models.
We design the first nearly optimal $ell_p$ subspace embeddings for all $pin(0,infty)$ in the online coreset, sliding window, and the adversarial streaming models.
arXiv Detail & Related papers (2022-07-17T19:40:51Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - On the Sample Complexity of Privately Learning Axis-Aligned Rectangles [16.092248433189816]
We revisit the fundamental problem of learning Axis-Aligned-Rectangles over a finite grid.
We present a novel algorithm that reduces the sample complexity to only $tildeOleftdcdotleft(log*|X|right)1.5right$.
arXiv Detail & Related papers (2021-07-24T04:06:11Z) - Differentially Private Stochastic Optimization: New Results in Convex
and Non-Convex Settings [15.122617358656539]
We present differentially private algorithms for convex and non-smooth general losses (GLLs)
For the convex case, we focus on the family of non-smooth generalized losses (GLLs)
arXiv Detail & Related papers (2021-07-12T17:06:08Z) - Provably Efficient Reinforcement Learning with Linear Function
Approximation Under Adaptivity Constraints [94.76881135901753]
We consider two popular limited adaptivity models: batch learning model and rare policy switch model.
Our proposed LSVI-UCB-Batch algorithm achieves an $tilde O(sqrtd3H3T + dHT/B)$ regret.
For the rare policy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an $tilde O(sqrtd3H3T[1+T/(dH)]dH/B)$ regret.
arXiv Detail & Related papers (2021-01-06T18:56:07Z) - An Algorithm for Learning Smaller Representations of Models With Scarce
Data [0.0]
We present a greedy algorithm for solving binary classification problems in situations where the dataset is too small or not fully representative.
It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data.
arXiv Detail & Related papers (2020-10-15T19:17:51Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
We give a sharp complexity analysis for EXTRA with the improved $Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$.
Our communication complexities of the accelerated EXTRA are only worse by the factors of $left(logfracLmu (1-sigma_2(W))right)$ and $left(logfrac1epsilon (1-
arXiv Detail & Related papers (2020-02-24T08:07:08Z) - Statistically Efficient, Polynomial Time Algorithms for Combinatorial
Semi Bandits [3.9919781077062497]
We consider semi-bandits over a set of arms $cal X subset 0,1d$ where rewards are uncorrelated across items.
For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = cal OBig( d (ln m)2 (ln T) over Delta_min Big)$, but it has computational complexity $cal O(|cal X|)$ which is typically exponential in $d$, and cannot
arXiv Detail & Related papers (2020-02-17T21:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.