Related papers: Active Covering

Active Covering

URL: http://arxiv.org/abs/2106.02552v1
Date: Fri, 4 Jun 2021 15:32:39 GMT
Title: Active Covering
Authors: Heinrich Jiang, Afshin Rostamizadeh
Abstract summary: We analyze the problem of active covering, where the learner is given an unlabeled dataset and can sequentially label query examples. The objective is to label query all of the positive examples in the fewest number of total label queries.
Score: 37.525977525895605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We analyze the problem of active covering, where the learner is given an unlabeled dataset and can sequentially label query examples. The objective is to label query all of the positive examples in the fewest number of total label queries. We show under standard non-parametric assumptions that a classical support estimator can be repurposed as an offline algorithm attaining an excess query cost of $\widetilde{\Theta}(n^{D/(D+1)})$ compared to the optimal learner, where $n$ is the number of datapoints and $D$ is the dimension. We then provide a simple active learning method that attains an improved excess query cost of $\widetilde{O}(n^{(D-1)/D})$. Furthermore, the proposed algorithms only require access to the positive labeled examples, which in certain settings provides additional computational and privacy benefits. Finally, we show that the active learning method consistently outperforms offline methods as well as a variety of baselines on a wide range of benchmark image-based datasets.

Related papers

Inference-time Alignment in Continuous Space [60.24209195785341]
We propose Simple Energy Adaptation ($textbfSEA$), a simple yet effective algorithm for inference-time alignment.<n>SEA adapts original responses from the base policy toward the optimal one via gradient-based sampling in continuous latent space.<n>For instance, SEA outperforms the second-best baseline with a relative improvement of up to $ textbf77.51%$ on AdvBench and $textbf16.36%$ on MATH.
arXiv Detail & Related papers (2025-05-26T14:58:33Z)
Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
Reinforcement Learning from Human Feedback with Active Queries [59.855433734053555]
Current reinforcement learning approaches often require a large amount of human-labelled preference data. We propose query-efficient RLHF methods inspired by the success of active learning. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
arXiv Detail & Related papers (2024-02-14T18:58:40Z)
Online Active Regression [8.397196353612042]
We consider an online extension of the active regression problem: the learner receives data points one by one and decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries.
arXiv Detail & Related papers (2022-07-13T03:53:25Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Adaptive Multi-Goal Exploration [118.40427257364729]
We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy. AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
arXiv Detail & Related papers (2021-11-23T17:59:50Z)
A Simple Baseline for Low-Budget Active Learning [15.54250249254414]
We show that a simple k-means clustering algorithm can outperform state-of-the-art active learning methods on low budgets. This method can be used as a simple baseline for low-budget active learning on image classification.
arXiv Detail & Related papers (2021-10-22T19:36:56Z)
How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$. In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z)
Zero-Round Active Learning [13.25385227263705]
Active learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool. Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical. Recent work proposes a solution for one-round active learning based on data utility learning and optimization.
arXiv Detail & Related papers (2021-07-14T13:47:13Z)
Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z)
How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets. Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
Active$^2$ Learning: Actively reducing redundancies in Active Learning methods for Sequence Tagging and Machine Translation [14.030275887949147]
Active Learning (AL) strategies reduce the need for huge volumes of labeled data by iteratively selecting a small number of examples for manual annotation. In this paper, we argue that since AL strategies choose examples independently, they may potentially select similar examples, all of which may not contribute significantly to the learning process. Our proposed approach, Active$mathbf2$ Learning (A$mathbf2$L), actively adapts to the deep learning model being trained to eliminate further such redundant examples.
arXiv Detail & Related papers (2021-03-11T06:27:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.