Related papers: Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

URL: http://arxiv.org/abs/2211.02233v1
Date: Fri, 4 Nov 2022 02:52:54 GMT
Title: Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler
Authors: Yifang Chen, Karthik Sankararaman, Alessandro Lazaric, Matteo Pirotta, Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang
Abstract summary: Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
Score: 89.27610526884496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming setting, where decisions must be taken \textit{online}. We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. Prior active learning algorithms with access to weak labelers learn a difference classifier which predicts where the weak labels differ from strong labelers; this requires the strong assumption of realizability of the difference classifier (Zhang and Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus is applicable to many real-world scenarios such as random corrupted weak labels and high dimensional family of difference classifiers (\textit{e.g.,} deep neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with full exploitation of weak labelers, which allows to convert any active learning strategy to one that can leverage weak labelers. We provide an instantiation of this template that achieves the optimal query complexity for any given weak labeler, without knowing its accuracy a-priori. Empirically, we propose an instantiation of the WL-AC template that can be efficiently implemented for large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.

Related papers

Robust Representation Learning for Unreliable Partial Label Learning [86.909511808373]
Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels. We propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively.
arXiv Detail & Related papers (2023-08-31T13:37:28Z)
AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts. Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Losses over Labels: Weakly Supervised Learning via Direct Loss Construction [71.11337906077483]
Programmable weak supervision is a growing paradigm within machine learning. We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
arXiv Detail & Related papers (2022-12-13T22:29:14Z)
Transductive CLIP with Class-Conditional Contrastive Learning [68.51078382124331]
We propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch. A class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels. ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels.
arXiv Detail & Related papers (2022-06-13T14:04:57Z)
Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection. LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data. LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z)
ATM: An Uncertainty-aware Active Self-training Framework for Label-efficient Text Classification [13.881283744970979]
ATM is a new framework that leverage self-training to exploit unlabeled data and is agnostic to the specific AL algorithm. We demonstrate that ATM outperforms the strongest active learning and self-training baselines and improve the label efficiency by 51.9% on average.
arXiv Detail & Related papers (2021-12-16T11:09:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.