Label Augmentation with Reinforced Labeling for Weak Supervision
- URL: http://arxiv.org/abs/2204.06436v1
- Date: Wed, 13 Apr 2022 14:54:02 GMT
- Title: Label Augmentation with Reinforced Labeling for Weak Supervision
- Authors: G\"urkan Solmaz, Flavio Cirillo, Fabio Maresca, Anagha Gode Anil Kumar
- Abstract summary: This paper proposes a new approach called reinforced labeling (RL)
RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples.
Experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains.
- Score: 0.1529342790344802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weak supervision (WS) is an alternative to the traditional supervised
learning to address the need for ground truth. Data programming is a practical
WS approach that allows programmatic labeling data samples using labeling
functions (LFs) instead of hand-labeling each data point. However, the existing
approach fails to fully exploit the domain knowledge encoded into LFs,
especially when the LFs' coverage is low. This is due to the common data
programming pipeline that neglects to utilize data features during the
generative process. This paper proposes a new approach called reinforced
labeling (RL). Given an unlabeled dataset and a set of LFs, RL augments the
LFs' outputs to cases not covered by LFs based on similarities among samples.
Thus, RL can lead to higher labeling coverage for training an end classifier.
The experiments on several domains (classification of YouTube comments, wine
quality, and weather prediction) result in considerable gains. The new approach
produces significant performance improvement, leading up to +21 points in
accuracy and +61 points in F1 scores compared to the state-of-the-art data
programming approach.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Leveraging Instance Features for Label Aggregation in Programmatic Weak
Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently.
The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions.
Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z) - Binary Classification with Positive Labeling Sources [71.37692084951355]
We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources.
We show WEAPO achieves the highest averaged performance on 10 benchmark datasets.
arXiv Detail & Related papers (2022-08-02T19:32:08Z) - Learned Label Aggregation for Weak Supervision [8.819582879892762]
We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily.
The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels.
We show the model can be trained using synthetically generated data and design an effective architecture for the model.
arXiv Detail & Related papers (2022-07-27T14:36:35Z) - ULF: Unsupervised Labeling Function Correction using Cross-Validation
for Weak Supervision [5.566060402907773]
Weak supervision (WS) is a cost-effective alternative to manual data labeling.
We introduce a new algorithm ULF for Unsupervised Labeling Function correction.
ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples.
arXiv Detail & Related papers (2022-04-14T10:29:01Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - Learning to Robustly Aggregate Labeling Functions for Semi-supervised
Data Programming [14.639568384768042]
A critical bottleneck in supervised machine learning is the need for large amounts of labeled data.
In this work, we propose an LF based reweighting framework ouralgo to solve these two critical limitations.
Our algorithm learns a joint model on the (same) labeled dataset used for LF induction along with any unlabeled data in a semi-supervised manner.
arXiv Detail & Related papers (2021-09-23T14:42:46Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Instance Credibility Inference for Few-Shot Learning [45.577880041135785]
Few-shot learning aims to recognize new objects with extremely limited training data for each category.
This paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning.
Our simple approach can establish new state-of-the-arts on four widely used few-shot learning benchmark datasets.
arXiv Detail & Related papers (2020-03-26T12:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.