Binary Classification with Positive Labeling Sources
- URL: http://arxiv.org/abs/2208.01704v1
- Date: Tue, 2 Aug 2022 19:32:08 GMT
- Title: Binary Classification with Positive Labeling Sources
- Authors: Jieyu Zhang, Yujing Wang, Yaming Yang, Yang Luo, Alexander Ratner
- Abstract summary: We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources.
We show WEAPO achieves the highest averaged performance on 10 benchmark datasets.
- Score: 71.37692084951355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To create a large amount of training labels for machine learning models
effectively and efficiently, researchers have turned to Weak Supervision (WS),
which uses programmatic labeling sources rather than manual annotation.
Existing works of WS for binary classification typically assume the presence of
labeling sources that are able to assign both positive and negative labels to
data in roughly balanced proportions. However, for many tasks of interest where
there is a minority positive class, negative examples could be too diverse for
developers to generate indicative labeling sources. Thus, in this work, we
study the application of WS on binary classification tasks with positive
labeling sources only. We propose WEAPO, a simple yet competitive WS method for
producing training labels without negative labeling sources. On 10 benchmark
datasets, we show WEAPO achieves the highest averaged performance in terms of
both the quality of synthesized labels and the performance of the final
classifier supervised with these labels. We incorporated the implementation of
\method into WRENCH, an existing benchmarking platform.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Label Propagation for Zero-shot Classification with Vision-Language Models [17.50253820510074]
In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data.
We introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification.
We perform extensive experiments to evaluate the effectiveness of our method on 14 common datasets and show that ZLaP outperforms the latest related works.
arXiv Detail & Related papers (2024-04-05T12:58:07Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - CLS: Cross Labeling Supervision for Semi-Supervised Learning [9.929229055862491]
Cross Labeling Supervision ( CLS) is a framework that generalizes the typical pseudo-labeling process.
CLS allows the creation of both pseudo and complementary labels to support both positive and negative learning.
arXiv Detail & Related papers (2022-02-17T08:09:40Z) - WRENCH: A Comprehensive Benchmark for Weak Supervision [66.82046201714766]
benchmark consists of 22 varied real-world datasets for classification and sequence tagging.
We use benchmark to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform.
arXiv Detail & Related papers (2021-09-23T13:47:16Z) - Weakly Supervised Classification Using Group-Level Labels [12.285265254225166]
We propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models.
We model group-level labels as Class Conditional Noisy (CCN) labels for individual instances and use the noisy labels to regularize predictions of the model trained on the strongly-labeled instances.
arXiv Detail & Related papers (2021-08-16T20:01:45Z) - Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced
Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance.
We propose a general pseudo-labeling framework to address the bias motivated by this observation.
We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Boosting the Performance of Semi-Supervised Learning with Unsupervised
Clustering [10.033658645311188]
We show that ignoring labels altogether for whole epochs intermittently during training can significantly improve performance in the small sample regime.
We demonstrate our method's efficacy in boosting several state-of-the-art SSL algorithms.
arXiv Detail & Related papers (2020-12-01T14:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.