Minimum Cost Active Labeling
- URL: http://arxiv.org/abs/2006.13999v1
- Date: Wed, 24 Jun 2020 19:01:05 GMT
- Title: Minimum Cost Active Labeling
- Authors: Hang Qiu, Krishna Chintalapudi, Ramesh Govindan
- Abstract summary: min-cost labeling uses a variant of active learning to learn a model to predict the optimal training set size.
In some cases, our approach has 6X lower overall cost relative to human labeling, and is always cheaper than the cheapest active learning strategy.
- Score: 2.0754848504005587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labeling a data set completely is important for groundtruth generation. In
this paper, we consider the problem of minimum-cost labeling: classifying all
images in a large data set with a target accuracy bound at minimum dollar cost.
Human labeling can be prohibitive, so we train a classifier to accurately label
part of the data set. However, training the classifier can be expensive too,
particularly with active learning. Our min-cost labeling uses a variant of
active learning to learn a model to predict the optimal training set size for
the classifier that minimizes overall cost, then uses active learning to train
the classifier to maximize the number of samples the classifier can correctly
label. We validate our approach on well-known public data sets such as Fashion,
CIFAR-10, and CIFAR-100. In some cases, our approach has 6X lower overall cost
relative to human labeling, and is always cheaper than the cheapest active
learning strategy.
Related papers
- Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with
Crowdsourcing and Active Learning [5.279873919047532]
We present Calpric, which combines automatic text selection and segmentation, active learning and the use of crowdsourced annotators to generate a large, balanced training set for privacy policies at low cost.
Calpric's training process also generates a labeled data set of 16K privacy policy text segments across 9 Data categories with balanced positive and negative samples.
arXiv Detail & Related papers (2024-01-16T01:27:26Z) - Tackling Concept Shift in Text Classification using Entailment-style
Modeling [2.2588825300186426]
We propose a reformulation, converting vanilla classification into an entailment-style problem.
We demonstrate the effectiveness of our proposed method on both real world & synthetic datasets.
arXiv Detail & Related papers (2023-11-06T18:15:36Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Cost-Accuracy Aware Adaptive Labeling for Active Learning [9.761953860259942]
In many real settings, different labelers have different labeling costs and can yield different labeling accuracies.
We propose a new algorithm for selecting instances, labelers and their corresponding costs and labeling accuracies.
Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.
arXiv Detail & Related papers (2021-05-24T17:21:00Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Labels, Information, and Computation: Efficient, Privacy-Preserving
Learning Using Sufficient Labels [0.0]
We show that we do not always need full label information on every single training example.
We call this statistic "sufficiently-labeled data" and prove its sufficiency and efficiency.
sufficiently-labeled data naturally preserves user privacy by storing relative, instead of absolute, information.
arXiv Detail & Related papers (2021-04-19T02:15:25Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z) - Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.