Adaptive approximation of monotone functions
- URL: http://arxiv.org/abs/2309.07530v1
- Date: Thu, 14 Sep 2023 08:56:31 GMT
- Title: Adaptive approximation of monotone functions
- Authors: Pierre Gaillard (Thoth), S\'ebastien Gerchinovitz (IMT), \'Etienne de
Montbrun (TSE-R)
- Abstract summary: We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors.
Perhaps as expected, the $Lp(mu)$ error of GreedyBox decreases much faster for piecewise-$C2$ functions than predicted by the algorithm.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the classical problem of approximating a non-decreasing function $f:
\mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its
values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a
known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize
the minimum number of evaluations of $f$ that algorithms need to guarantee an
approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after
stopping. Unlike worst-case results that hold uniformly over all $f$, our
complexity measure is dependent on each specific function $f$. To address this
problem, we introduce GreedyBox, a generalization of an algorithm originally
proposed by Novak (1992) for numerical integration. We prove that GreedyBox
achieves an optimal sample complexity for any function $f$, up to logarithmic
factors. Additionally, we uncover results regarding piecewise-smooth functions.
Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster
for piecewise-$C^2$ functions than predicted by the algorithm (without any
knowledge on the smoothness of $f$). A simple modification even achieves
optimal minimax approximation rates for such functions, which we compute
explicitly. In particular, our findings highlight multiple performance gaps
between adaptive and non-adaptive algorithms, smooth and piecewise-smooth
functions, as well as monotone or non-monotone functions. Finally, we provide
numerical experiments to support our theoretical results.
Related papers
- Dueling Optimization with a Monotone Adversary [35.850072415395395]
We study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.
The goal is to design an online algorithm to find a minimizer $mathbfx*$ for a function $fcolon X to mathbbRd.
arXiv Detail & Related papers (2023-11-18T23:55:59Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Agnostic proper learning of monotone functions: beyond the black-box
correction barrier [6.47243430672461]
Given $2tildeO(sqrtn/varepsilon)$ uniformly random examples of an unknown function $f:pm 1n rightarrow pm 1$, our algorithm outputs a hypothesis $g:pm 1n rightarrow pm 1$ that is monotone.
We also give an algorithm for estimating up to additive error $varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2tildeO(sqrt
arXiv Detail & Related papers (2023-04-05T18:52:10Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Near-Optimal Lower Bounds For Convex Optimization For All Orders of
Smoothness [26.71898403195793]
We study the complexity of optimizing highly smooth convex functions.
For a positive integer $p$, we want to find an $epsilon$-approximate minimum of a convex function $f$.
We prove a new lower bound that matches this bound (up to log factors), and holds not only for randomized algorithms, but also for quantum algorithms.
arXiv Detail & Related papers (2021-12-02T10:51:43Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.