Active Local Learning
- URL: http://arxiv.org/abs/2008.13374v2
- Date: Fri, 4 Sep 2020 02:58:39 GMT
- Title: Active Local Learning
- Authors: Arturs Backurs, Avrim Blum, Neha Gupta
- Abstract summary: We consider active local learning: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h in H$.
In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined, independent of $x$.
This immediately also implies an algorithm for distance estimation: estimating the value $opt(H)$ from many fewer labels than needed to actually learn a near-optimal $h in
- Score: 22.823171570933397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we consider active local learning: given a query point $x$, and
active access to an unlabeled training set $S$, output the prediction $h(x)$ of
a near-optimal $h \in H$ using significantly fewer labels than would be needed
to actually learn $h$ fully. In particular, the number of label queries should
be independent of the complexity of $H$, and the function $h$ should be
well-defined, independent of $x$. This immediately also implies an algorithm
for distance estimation: estimating the value $opt(H)$ from many fewer labels
than needed to actually learn a near-optimal $h \in H$, by running local
learning on a few random query points and computing the average error.
For the hypothesis class consisting of functions supported on the interval
$[0,1]$ with Lipschitz constant bounded by $L$, we present an algorithm that
makes $O(({1 / \epsilon^6}) \log(1/\epsilon))$ label queries from an unlabeled
pool of $O(({L / \epsilon^4})\log(1/\epsilon))$ samples. It estimates the
distance to the best hypothesis in the class to an additive error of $\epsilon$
for an arbitrary underlying distribution. We further generalize our algorithm
to more than one dimensions. We emphasize that the number of labels used is
independent of the complexity of the hypothesis class which depends on $L$.
Furthermore, we give an algorithm to locally estimate the values of a
near-optimal function at a few query points of interest with number of labels
independent of $L$.
We also consider the related problem of approximating the minimum error that
can be achieved by the Nadaraya-Watson estimator under a linear diagonal
transformation with eigenvalues coming from a small range. For a
$d$-dimensional pointset of size $N$, our algorithm achieves an additive
approximation of $\epsilon$, makes $\tilde{O}({d}/{\epsilon^2})$ queries and
runs in $\tilde{O}({d^2}/{\epsilon^{d+4}}+{dN}/{\epsilon^2})$ time.
Related papers
- A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation.
In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$.
For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z) - Semi-supervised Active Regression [21.51757844385258]
This paper studies the use of partially labelled, potentially biased data for learning tasks.
The learner has access to a dataset $X in mathbbRd | X min_beta in mathbbRd | X beta - Y |2 end2 equation while making few additional label queries.
arXiv Detail & Related papers (2021-06-12T03:28:43Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.