Apple Tasting: Combinatorial Dimensions and Minimax Rates
- URL: http://arxiv.org/abs/2310.19064v3
- Date: Tue, 18 Jun 2024 21:54:41 GMT
- Title: Apple Tasting: Combinatorial Dimensions and Minimax Rates
- Authors: Vinod Raman, Unique Subedi, Ananth Raman, Ambuj Tewari,
- Abstract summary: In online binary classification under emphapple tasting feedback, the learner only observes the true label if it predicts 1"
We show that in the realizable setting, the expected number of mistakes can be $Theta$ or $Theta(T)$.
- Score: 16.52706654413988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In online binary classification under \emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \emph{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes of any learner, under apple tasting feedback, can be $\Theta(1), \Theta(\sqrt{T})$, or $\Theta(T)$. This is in contrast to the full-information realizable setting where only $\Theta(1)$ and $\Theta(T)$ are possible.
Related papers
- Annotation Efficiency: Identifying Hard Samples via Blocked Sparse Linear Bandits [23.329605738829947]
This paper considers the problem of annotating datapoints using an expert with only a few annotation rounds in a label-scarce setting.
We propose soliciting reliable feedback on difficulty in annotating a datapoint from the expert in addition to ground truth label.
arXiv Detail & Related papers (2024-10-26T01:42:03Z) - Deterministic Apple Tasting [2.4554686192257424]
We provide the first widely-applicable deterministic apple tasting learner.
We prove a trichotomy stating that every class $mathcalH$ must be either easy, hard, or unlearnable.
Our upper bound is based on a deterministic algorithm for learning from expert advice with apple tasting feedback.
arXiv Detail & Related papers (2024-10-14T11:54:46Z) - The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback.
We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - Provably Efficient High-Dimensional Bandit Learning with Batched
Feedbacks [93.00280593719513]
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches.
In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch.
Our algorithm achieves regret bounds comparable to those in fully sequential setting with only $mathcalO( log T)$ batches.
arXiv Detail & Related papers (2023-11-22T06:06:54Z) - A Trichotomy for Transductive Online Learning [32.03948071550447]
We present new upper and lower bounds on the number of learner mistakes in the transductive' online learning setting of Ben-David, Kushilevitz and Mansour (1997).
This setting is similar to standard online learning, except that the adversary fixes a sequence of instances to be labeled at the start of the game, and this sequence is known to the learner.
arXiv Detail & Related papers (2023-11-10T23:27:23Z) - Towards the Fundamental Limits of Knowledge Transfer over Finite Domains [8.575522204707957]
We show that privileged information at three progressive levels accelerates the transfer.
At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $sqrt|mathcal Smathcal A|/n$.
The third level further equips the student with the soft labels (complete logits) on $mathcal A$ given every sampled input, thereby provably enables the student to enjoy a rate $|mathcal S|/n$ free of $
arXiv Detail & Related papers (2023-10-11T19:30:08Z) - Self-Directed Linear Classification [50.659479930171585]
In online classification, a learner aims to predict their labels in an online fashion so as to minimize the total number of mistakes.
Here we study the power of choosing the prediction order and establish the first strong separation between worst-order and random-order learning.
arXiv Detail & Related papers (2023-08-06T15:38:44Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Bandit Phase Retrieval [45.12888201134656]
We prove that the minimax cumulative regret in this problem is $smashtilde Theta(d sqrtn)$.
We also show that the minimax simple regret is $smashtilde Theta(d / sqrtn)$ and that this is only achievable by an adaptive algorithm.
arXiv Detail & Related papers (2021-06-03T08:04:33Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.