Leveraging Importance Weights in Subset Selection
- URL: http://arxiv.org/abs/2301.12052v1
- Date: Sat, 28 Jan 2023 02:07:31 GMT
- Title: Leveraging Importance Weights in Subset Selection
- Authors: Gui Citovsky, Giulia DeSalvo, Sanjiv Kumar, Srikumar Ramalingam,
Afshin Rostamizadeh, Yunjuan Wang
- Abstract summary: We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting.
Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches.
- Score: 45.54597544672441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a subset selection algorithm designed to work with arbitrary model
families in a practical batch setting. In such a setting, an algorithm can
sample examples one at a time but, in order to limit overhead costs, is only
able to update its state (i.e. further train model weights) once a large enough
batch of examples is selected. Our algorithm, IWeS, selects examples by
importance sampling where the sampling probability assigned to each example is
based on the entropy of models trained on previously selected batches. IWeS
admits significant performance improvement compared to other subset selection
algorithms for seven publicly available datasets. Additionally, it is
competitive in an active learning setting, where the label information is not
available at selection time. We also provide an initial theoretical analysis to
support our importance weighting approach, proving generalization and sampling
rate bounds.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - ActiveDC: Distribution Calibration for Active Finetuning [36.64444238742072]
We propose a new method called ActiveDC for the active finetuning tasks.
We calibrate the distribution of the selected samples by exploiting implicit category information in the unlabeled pool.
The results indicate that ActiveDC consistently outperforms the baseline performance in all image classification tasks.
arXiv Detail & Related papers (2023-11-13T14:35:18Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Efficient Failure Pattern Identification of Predictive Algorithms [15.02620042972929]
We propose a human-machine collaborative framework that consists of a team of human annotators and a sequential recommendation algorithm.
The results empirically demonstrate the competitive performance of our framework on multiple datasets at various signal-to-noise ratios.
arXiv Detail & Related papers (2023-06-01T14:54:42Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem.
In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting.
We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.