ULF: Unsupervised Labeling Function Correction using Cross-Validation
for Weak Supervision
- URL: http://arxiv.org/abs/2204.06863v4
- Date: Wed, 3 Jan 2024 20:52:22 GMT
- Title: ULF: Unsupervised Labeling Function Correction using Cross-Validation
for Weak Supervision
- Authors: Anastasiia Sedova, Benjamin Roth
- Abstract summary: Weak supervision (WS) is a cost-effective alternative to manual data labeling.
We introduce a new algorithm ULF for Unsupervised Labeling Function correction.
ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples.
- Score: 5.566060402907773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A cost-effective alternative to manual data labeling is weak supervision
(WS), where data samples are automatically annotated using a predefined set of
labeling functions (LFs), rule-based mechanisms that generate artificial labels
for the associated classes. In this work, we investigate noise reduction
techniques for WS based on the principle of k-fold cross-validation. We
introduce a new algorithm ULF for Unsupervised Labeling Function correction,
which denoises WS data by leveraging models trained on all but some LFs to
identify and correct biases specific to the held-out LFs. Specifically, ULF
refines the allocation of LFs to classes by re-estimating this assignment on
highly reliable cross-validated samples. Evaluation on multiple datasets
confirms ULF's effectiveness in enhancing WS learning without the need for
manual labeling.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation [13.486951040331899]
Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data.
Existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training.
We propose Pseudo Labeling Filter (PLF) to improve the quality of pseudo-labels.
arXiv Detail & Related papers (2024-06-03T04:09:36Z) - Uncertainty-Aware Pseudo-Label Filtering for Source-Free Unsupervised Domain Adaptation [45.53185386883692]
Source-free unsupervised domain adaptation (SFUDA) aims to enable the utilization of a pre-trained source model in an unlabeled target domain without access to source data.
We propose a method called Uncertainty-aware Pseudo-label-filtering Adaptation (UPA) to efficiently address this issue in a coarse-to-fine manner.
arXiv Detail & Related papers (2024-03-17T16:19:40Z) - Parameter-tuning-free data entry error unlearning with adaptive
selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning.
We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks.
The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z) - Leveraging Instance Features for Label Aggregation in Programmatic Weak
Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently.
The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions.
Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z) - Sparse Conditional Hidden Markov Model for Weakly Supervised Named
Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions.
Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline.
It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z) - Label Augmentation with Reinforced Labeling for Weak Supervision [0.1529342790344802]
This paper proposes a new approach called reinforced labeling (RL)
RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples.
Experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains.
arXiv Detail & Related papers (2022-04-13T14:54:02Z) - Learning to Robustly Aggregate Labeling Functions for Semi-supervised
Data Programming [14.639568384768042]
A critical bottleneck in supervised machine learning is the need for large amounts of labeled data.
In this work, we propose an LF based reweighting framework ouralgo to solve these two critical limitations.
Our algorithm learns a joint model on the (same) labeled dataset used for LF induction along with any unlabeled data in a semi-supervised manner.
arXiv Detail & Related papers (2021-09-23T14:42:46Z) - Cycle Self-Training for Domain Adaptation [85.14659717421533]
Cycle Self-Training (CST) is a principled self-training algorithm that enforces pseudo-labels to generalize across domains.
CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail.
Empirical results indicate that CST significantly improves over prior state-of-the-arts in standard UDA benchmarks.
arXiv Detail & Related papers (2021-03-05T10:04:25Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.