Active Cost-aware Labeling of Streaming Data
- URL: http://arxiv.org/abs/2304.06808v2
- Date: Tue, 4 Jul 2023 22:09:07 GMT
- Title: Active Cost-aware Labeling of Streaming Data
- Authors: Ting Cai, Kirthevasan Kandasamy
- Abstract summary: We study actively labeling streaming data, where an active learner is faced with a stream of data points and must carefully choose which of these points to label via an expensive experiment.
We first study a setting when the data's inputs belong to one of $K$ discrete distributions and formalize this problem via a loss that captures the labeling cost and the prediction error.
When the labeling cost is $B$, our algorithm, which chooses to label a point if the uncertainty is larger than a time and cost dependent threshold, achieves a worst-case upper bound of $widetildeO(Bfrac1
- Score: 11.501619634838312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study actively labeling streaming data, where an active learner is faced
with a stream of data points and must carefully choose which of these points to
label via an expensive experiment. Such problems frequently arise in
applications such as healthcare and astronomy. We first study a setting when
the data's inputs belong to one of $K$ discrete distributions and formalize
this problem via a loss that captures the labeling cost and the prediction
error. When the labeling cost is $B$, our algorithm, which chooses to label a
point if the uncertainty is larger than a time and cost dependent threshold,
achieves a worst-case upper bound of $\widetilde{O}(B^{\frac{1}{3}}
K^{\frac{1}{3}} T^{\frac{2}{3}})$ on the loss after $T$ rounds. We also provide
a more nuanced upper bound which demonstrates that the algorithm can adapt to
the arrival pattern, and achieves better performance when the arrival pattern
is more favorable. We complement both upper bounds with matching lower bounds.
We next study this problem when the inputs belong to a continuous domain and
the output of the experiment is a smooth function with bounded RKHS norm. After
$T$ rounds in $d$ dimensions, we show that the loss is bounded by
$\widetilde{O}(B^{\frac{1}{d+3}} T^{\frac{d+2}{d+3}})$ in an RKHS with a
squared exponential kernel and by $\widetilde{O}(B^{\frac{1}{2d+3}}
T^{\frac{2d+2}{2d+3}})$ in an RKHS with a Mat\'ern kernel. Our empirical
evaluation demonstrates that our method outperforms other baselines in several
synthetic experiments and two real experiments in medicine and astronomy.
Related papers
- Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data [3.1138411427556445]
In the measurement-constrained problems, despite the availability of large datasets, we may be only affordable to observe the labels on a small portion of the large dataset.
This poses a critical question that which data points are most beneficial to label given a budget constraint.
In this paper, we focus on the estimation of the optimal individualized threshold in a measurement-constrained M-estimation framework.
arXiv Detail & Related papers (2024-11-21T00:21:17Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.
We propose a new learning paradigm that integrates both paired and unpaired data.
Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Towards the Fundamental Limits of Knowledge Transfer over Finite Domains [8.575522204707957]
We show that privileged information at three progressive levels accelerates the transfer.
At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $sqrt|mathcal Smathcal A|/n$.
The third level further equips the student with the soft labels (complete logits) on $mathcal A$ given every sampled input, thereby provably enables the student to enjoy a rate $|mathcal S|/n$ free of $
arXiv Detail & Related papers (2023-10-11T19:30:08Z) - Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Computationally Efficient Horizon-Free Reinforcement Learning for Linear
Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs.
Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic.
This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z) - Coresets for Classification -- Simplified and Strengthened [19.54307474041768]
We give relative error coresets for training linear classifiers with a broad class of loss functions.
Our construction achieves $tilde O(d cdot mu_y(X)2/epsilon2)$ points, where $mu_y(X)$ is a natural complexity measure of the data matrix $X in mathbbRn times d$ and label vector $y in -1,1n$.
arXiv Detail & Related papers (2021-06-08T11:24:18Z) - Provably Breaking the Quadratic Error Compounding Barrier in Imitation
Learning, Optimally [58.463668865380946]
We study the statistical limits of Imitation Learning in episodic Markov Decision Processes (MDPs) with a state space $mathcalS$.
We establish an upper bound $O(|mathcalS|H3/2/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al ( 2020)
We show the minimax suboptimality grows as $Omega( H3/2/N)$ when $mathcalS|geq 3$ while the unknown-transition setting suffers from a larger sharp rate
arXiv Detail & Related papers (2021-02-25T15:50:19Z) - Online Robust Regression via SGD on the l1 loss [19.087335681007477]
We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner.
We show in this work that the descent on the $ell_O( 1 / (1 - eta)2 n )$ loss converges to the true parameter vector at a $tildeO( 1 / (1 - eta)2 n )$ rate which is independent of the values of the contaminated measurements.
arXiv Detail & Related papers (2020-07-01T11:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.