Active Learning from Crowd in Document Screening
- URL: http://arxiv.org/abs/2012.02297v1
- Date: Wed, 11 Nov 2020 16:17:28 GMT
- Title: Active Learning from Crowd in Document Screening
- Authors: Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon, Zolt\'an Szl\'avik
- Abstract summary: We focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently.
We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling.
We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.
- Score: 76.9545252341746
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we explore how to efficiently combine crowdsourcing and
machine intelligence for the problem of document screening, where we need to
screen documents with a set of machine-learning filters. Specifically, we focus
on building a set of machine learning classifiers that evaluate documents, and
then screen them efficiently. It is a challenging task since the budget is
limited and there are countless number of ways to spend the given budget on the
problem. We propose a multi-label active learning screening specific sampling
technique -- objective-aware sampling -- for querying unlabelled documents for
annotating. Our algorithm takes a decision on which machine filter need more
training data and how to choose unlabeled items to annotate in order to
minimize the risk of overall classification errors rather than minimizing a
single filter error. We demonstrate that objective-aware sampling significantly
outperforms the state of the art active learning sampling strategies.
Related papers
- Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Cold PAWS: Unsupervised class discovery and addressing the cold-start
problem for semi-supervised learning [0.30458514384586394]
We propose a novel approach based on well-established self-supervised learning, clustering, and manifold learning techniques.
We test our approach using several publicly available datasets, namely CIFAR10, Imagenette, DeepWeeds, and EuroSAT.
We obtain superior performance for the datasets considered with a much simpler approach compared to other methods in the literature.
arXiv Detail & Related papers (2023-05-17T09:17:59Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Using Self-Supervised Pretext Tasks for Active Learning [7.214674613451605]
We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative.
The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses.
In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated.
arXiv Detail & Related papers (2022-01-19T07:58:06Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z) - Adaptive Sample Selection for Robust Learning under Label Noise [1.71982924656402]
Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily labelled data.
A prominent class of algorithms rely on sample selection strategies, motivated by curriculum learning.
We propose a data-dependent, adaptive sample selection strategy that relies only on batch statistics.
arXiv Detail & Related papers (2021-06-29T12:10:58Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.