Making Progress Based on False Discoveries
- URL: http://arxiv.org/abs/2204.08809v1
- Date: Tue, 19 Apr 2022 11:17:10 GMT
- Title: Making Progress Based on False Discoveries
- Authors: Roi Livni
- Abstract summary: We consider the question of adaptive data analysis within the framework of convex optimization.
We show that for a general analyst (not necessarily gradient descent) $Omega (1/epsilon3)$ samples are required.
Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent $tilde Omega (1/epsilon2.5)$ samples are necessary.
- Score: 14.505867475659274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the question of adaptive data analysis within the framework of
convex optimization. We ask how many samples are needed in order to compute
$\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients queried by
gradient descent, and we provide two intermediate answers to this question.
First, we show that for a general analyst (not necessarily gradient descent)
$\Omega(1/\epsilon^3)$ samples are required. This rules out the possibility of
a foolproof mechanism. Our construction builds upon a new lower bound (that may
be of interest of its own right) for an analyst that may ask several non
adaptive questions in a batch of fixed and known $T$ rounds of adaptivity and
requires a fraction of true discoveries. We show that for such an analyst
$\Omega (\sqrt{T}/\epsilon^2)$ samples are necessary.
Second, we show that, under certain assumptions on the oracle, in an
interaction with gradient descent $\tilde \Omega(1/\epsilon^{2.5})$ samples are
necessary. Our assumptions are that the oracle has only \emph{first order
access} and is \emph{post-hoc generalizing}. First order access means that it
can only compute the gradients of the sampled function at points queried by the
algorithm. Our assumption of \emph{post-hoc generalization} follows from
existing lower bounds for statistical queries. More generally then, we provide
a generic reduction from the standard setting of statistical queries to the
problem of estimating gradients queried by gradient descent.
These results are in contrast with classical bounds that show that with
$O(1/\epsilon^2)$ samples one can optimize the population risk to accuracy of
$O(\epsilon)$ but, as it turns out, with spurious gradients.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Sample-optimal classical shadows for pure states [0.0]
We consider the classical shadows task for pure states in the setting of both joint and independent measurements.
In the independent measurement setting, we show that $mathcal O(sqrtBd epsilon-1 + epsilon-2)$ samples suffice.
arXiv Detail & Related papers (2022-11-21T19:24:17Z) - Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly.
We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability.
Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Quantum tomography using state-preparation unitaries [0.22940141855172028]
We describe algorithms to obtain an approximate classical description of a $d$-dimensional quantum state when given access to a unitary.
We show that it takes $widetildeTheta(d/varepsilon)$ applications of the unitary to obtain an $varepsilon$-$ell$-approximation of the state.
We give an efficient algorithm for obtaining Schatten $q$-optimal estimates of a rank-$r$ mixed state.
arXiv Detail & Related papers (2022-07-18T17:56:18Z) - Generalization Bounds for Gradient Methods via Discrete and Continuous
Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD)
We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.