Active Covering
- URL: http://arxiv.org/abs/2106.02552v1
- Date: Fri, 4 Jun 2021 15:32:39 GMT
- Title: Active Covering
- Authors: Heinrich Jiang, Afshin Rostamizadeh
- Abstract summary: We analyze the problem of active covering, where the learner is given an unlabeled dataset and can sequentially label query examples.
The objective is to label query all of the positive examples in the fewest number of total label queries.
- Score: 37.525977525895605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze the problem of active covering, where the learner is given an
unlabeled dataset and can sequentially label query examples. The objective is
to label query all of the positive examples in the fewest number of total label
queries. We show under standard non-parametric assumptions that a classical
support estimator can be repurposed as an offline algorithm attaining an excess
query cost of $\widetilde{\Theta}(n^{D/(D+1)})$ compared to the optimal
learner, where $n$ is the number of datapoints and $D$ is the dimension. We
then provide a simple active learning method that attains an improved excess
query cost of $\widetilde{O}(n^{(D-1)/D})$. Furthermore, the proposed
algorithms only require access to the positive labeled examples, which in
certain settings provides additional computational and privacy benefits.
Finally, we show that the active learning method consistently outperforms
offline methods as well as a variety of baselines on a wide range of benchmark
image-based datasets.
Related papers
- Online Active Regression [8.397196353612042]
We consider an online extension of the active regression problem: the learner receives data points one by one and decides whether it should collect the corresponding labels.
The goal is to efficiently maintain the regression of received data points with a small budget of label queries.
arXiv Detail & Related papers (2022-07-13T03:53:25Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Adaptive Multi-Goal Exploration [118.40427257364729]
We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy.
AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
arXiv Detail & Related papers (2021-11-23T17:59:50Z) - A Simple Baseline for Low-Budget Active Learning [15.54250249254414]
We show that a simple k-means clustering algorithm can outperform state-of-the-art active learning methods on low budgets.
This method can be used as a simple baseline for low-budget active learning on image classification.
arXiv Detail & Related papers (2021-10-22T19:36:56Z) - How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning.
We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$.
In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z) - Zero-Round Active Learning [13.25385227263705]
Active learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool.
Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical.
Recent work proposes a solution for one-round active learning based on data utility learning and optimization.
arXiv Detail & Related papers (2021-07-14T13:47:13Z) - Low Budget Active Learning via Wasserstein Distance: An Integer
Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.
We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool.
Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z) - Active$^2$ Learning: Actively reducing redundancies in Active Learning
methods for Sequence Tagging and Machine Translation [14.030275887949147]
Active Learning (AL) strategies reduce the need for huge volumes of labeled data by iteratively selecting a small number of examples for manual annotation.
In this paper, we argue that since AL strategies choose examples independently, they may potentially select similar examples, all of which may not contribute significantly to the learning process.
Our proposed approach, Active$mathbf2$ Learning (A$mathbf2$L), actively adapts to the deep learning model being trained to eliminate further such redundant examples.
arXiv Detail & Related papers (2021-03-11T06:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.