Pareto Active Learning with Gaussian Processes and Adaptive
Discretization
- URL: http://arxiv.org/abs/2006.14061v2
- Date: Mon, 1 Nov 2021 11:26:34 GMT
- Title: Pareto Active Learning with Gaussian Processes and Adaptive
Discretization
- Authors: Andi Nika, Kerem Bozgan, Sepehr Elahi, \c{C}a\u{g}{\i}n Ararat, Cem
Tekin
- Abstract summary: We propose an algorithm that exploits the smoothness of the GP-sampled function and the structure of $(cal X,d)$ to learn fast.
In essence, Adaptive $boldsymbolepsilon$-PAL employs a tree-based adaptive discretization technique to identify an $boldsymbolepsilon$-accurate Pareto set of designs.
- Score: 12.179548969182573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of optimizing a vector-valued objective function
$\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a
well-behaved, compact metric space $({\cal X},d)$ of designs. We assume that
$\boldsymbol{f}$ is not known beforehand and that evaluating $\boldsymbol{f}$
at design $x$ results in a noisy observation of $\boldsymbol{f}(x)$. Since
identifying the Pareto optimal designs via exhaustive search is infeasible when
the cardinality of ${\cal X}$ is large, we propose an algorithm, called
Adaptive $\boldsymbol{\epsilon}$-PAL, that exploits the smoothness of the
GP-sampled function and the structure of $({\cal X},d)$ to learn fast. In
essence, Adaptive $\boldsymbol{\epsilon}$-PAL employs a tree-based adaptive
discretization technique to identify an $\boldsymbol{\epsilon}$-accurate Pareto
set of designs in as few evaluations as possible. We provide both
information-type and metric dimension-type bounds on the sample complexity of
$\boldsymbol{\epsilon}$-accurate Pareto set identification. We also
experimentally show that our algorithm outperforms other Pareto set
identification methods on several benchmark datasets.
Related papers
- Pruning is Optimal for Learning Sparse Features in High-Dimensions [15.967123173054535]
We show that a class of statistical models can be optimally learned using pruned neural networks trained with gradient descent.
We show that pruning neural networks proportional to the sparsity level of $boldsymbolV$ improves their sample complexity compared to unpruned networks.
arXiv Detail & Related papers (2024-06-12T21:43:12Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Compressive Recovery of Sparse Precision Matrices [5.557600489035657]
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X in mathbbRn times d$.
We show that it is possible to estimate it from a sketch of size $m=Omegaleft((d+2k)log(d)right)$ where $k$ is the maximal number of edges of the underlying graph.
We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser.
arXiv Detail & Related papers (2023-11-08T13:29:08Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Max-Margin Token Selection in Attention Mechanism [34.11955962489916]
We prove that running gradient descent on $boldsymbolp$ or equivalently $boldW$ converges in to a max-margin solution.
Remarkably, our results are applicable to general data and precisely as an optimal token selection.
arXiv Detail & Related papers (2023-06-23T16:35:46Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - A One-bit, Comparison-Based Gradient Estimator [29.600975900977343]
We show how one can use tools from one-bit compressed sensing to construct a robust and reliable estimator of the normalized gradient.
We propose an algorithm, coined SCOBO, that uses this estimator within a gradient descent scheme.
arXiv Detail & Related papers (2020-10-06T05:01:38Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.