Self-Directed Linear Classification
- URL: http://arxiv.org/abs/2308.03142v1
- Date: Sun, 6 Aug 2023 15:38:44 GMT
- Title: Self-Directed Linear Classification
- Authors: Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
- Abstract summary: In online classification, a learner aims to predict their labels in an online fashion so as to minimize the total number of mistakes.
Here we study the power of choosing the prediction order and establish the first strong separation between worst-order and random-order learning.
- Score: 50.659479930171585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In online classification, a learner is presented with a sequence of examples
and aims to predict their labels in an online fashion so as to minimize the
total number of mistakes. In the self-directed variant, the learner knows in
advance the pool of examples and can adaptively choose the order in which
predictions are made. Here we study the power of choosing the prediction order
and establish the first strong separation between worst-order and random-order
learning for the fundamental task of linear classification. Prior to our work,
such a separation was known only for very restricted concept classes, e.g.,
one-dimensional thresholds or axis-aligned rectangles.
We present two main results. If $X$ is a dataset of $n$ points drawn
uniformly at random from the $d$-dimensional unit sphere, we design an
efficient self-directed learner that makes $O(d \log \log(n))$ mistakes and
classifies the entire dataset. If $X$ is an arbitrary $d$-dimensional dataset
of size $n$, we design an efficient self-directed learner that predicts the
labels of $99\%$ of the points in $X$ with mistake bound independent of $n$. In
contrast, under a worst- or random-ordering, the number of mistakes must be at
least $\Omega(d \log n)$, even when the points are drawn uniformly from the
unit sphere and the learner only needs to predict the labels for $1\%$ of them.
Related papers
- Transfer Learning for Latent Variable Network Models [18.31057192626801]
We study transfer learning for estimation in latent variable network models.
We show that if the latent variables are shared, then vanishing error is possible.
Our algorithm achieves $o(1)$ error and does not assume a parametric form on the source or target networks.
arXiv Detail & Related papers (2024-06-05T16:33:30Z) - Online Learning of Halfspaces with Massart Noise [47.71073318490341]
We study the task of online learning in the presence of Massart noise.
We present a computationally efficient algorithm that achieves mistake bound $eta T + o(T)$.
We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) Delta T - o(T)$ bigger than choosing a random action at every round.
arXiv Detail & Related papers (2024-05-21T17:31:10Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance.
Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action.
We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z) - Littlestone Classes are Privately Online Learnable [28.04975353867202]
We consider the problem of online classification under a privacy constraint.
In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 leq t leq T$, and returns at each iteration a hypothesis $h_t$ which is used to predict the label of each new example $x_t$.
The learner's performance is measured by her regret against a known hypothesis class $mathcalH$.
arXiv Detail & Related papers (2021-06-25T09:08:33Z) - Locality defeats the curse of dimensionality in convolutional
teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$.
We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z) - Remember What You Want to Forget: Algorithms for Machine Unlearning [37.656345901739506]
We study the problem of forgetting datapoints from a learnt model.
We provide an unlearning algorithm that can delete up to $O(n/d1/4)$ samples.
arXiv Detail & Related papers (2021-03-04T19:28:57Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - The generalization error of max-margin linear classifiers: Benign
overfitting and high dimensional asymptotics in the overparametrized regime [11.252856459394854]
Modern machine learning classifiers often exhibit vanishing classification error on the training set.
Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data.
arXiv Detail & Related papers (2019-11-05T00:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.