Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler
- URL: http://arxiv.org/abs/2211.02233v1
- Date: Fri, 4 Nov 2022 02:52:54 GMT
- Title: Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler
- Authors: Yifang Chen, Karthik Sankararaman, Alessandro Lazaric, Matteo Pirotta,
Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang
- Abstract summary: Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
- Score: 89.27610526884496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning with strong and weak labelers considers a practical setting
where we have access to both costly but accurate strong labelers and inaccurate
but cheap predictions provided by weak labelers. We study this problem in the
streaming setting, where decisions must be taken \textit{online}. We design a
novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to
robustly leverage the lower quality weak labelers to reduce the query
complexity while retaining the desired level of accuracy. Prior active learning
algorithms with access to weak labelers learn a difference classifier which
predicts where the weak labels differ from strong labelers; this requires the
strong assumption of realizability of the difference classifier (Zhang and
Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus
is applicable to many real-world scenarios such as random corrupted weak labels
and high dimensional family of difference classifiers (\textit{e.g.,} deep
neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with
full exploitation of weak labelers, which allows to convert any active learning
strategy to one that can leverage weak labelers. We provide an instantiation of
this template that achieves the optimal query complexity for any given weak
labeler, without knowing its accuracy a-priori. Empirically, we propose an
instantiation of the WL-AC template that can be efficiently implemented for
large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness
on the corrupted-MNIST dataset by significantly reducing the number of labels
while keeping the same accuracy as in passive learning.
Related papers
- Robust Representation Learning for Unreliable Partial Label Learning [86.909511808373]
Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth.
This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels.
We propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively.
arXiv Detail & Related papers (2023-08-31T13:37:28Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Losses over Labels: Weakly Supervised Learning via Direct Loss
Construction [71.11337906077483]
Programmable weak supervision is a growing paradigm within machine learning.
We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label.
We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
arXiv Detail & Related papers (2022-12-13T22:29:14Z) - Transductive CLIP with Class-Conditional Contrastive Learning [68.51078382124331]
We propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch.
A class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels.
ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels.
arXiv Detail & Related papers (2022-06-13T14:04:57Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - ATM: An Uncertainty-aware Active Self-training Framework for
Label-efficient Text Classification [13.881283744970979]
ATM is a new framework that leverage self-training to exploit unlabeled data and is agnostic to the specific AL algorithm.
We demonstrate that ATM outperforms the strongest active learning and self-training baselines and improve the label efficiency by 51.9% on average.
arXiv Detail & Related papers (2021-12-16T11:09:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.