Related papers: ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

URL: http://arxiv.org/abs/2204.06863v4
Date: Wed, 3 Jan 2024 20:52:22 GMT
Title: ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision
Authors: Anastasiia Sedova, Benjamin Roth
Abstract summary: Weak supervision (WS) is a cost-effective alternative to manual data labeling. We introduce a new algorithm ULF for Unsupervised Labeling Function correction. ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples.
Score: 5.566060402907773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A cost-effective alternative to manual data labeling is weak supervision (WS), where data samples are automatically annotated using a predefined set of labeling functions (LFs), rule-based mechanisms that generate artificial labels for the associated classes. In this work, we investigate noise reduction techniques for WS based on the principle of k-fold cross-validation. We introduce a new algorithm ULF for Unsupervised Labeling Function correction, which denoises WS data by leveraging models trained on all but some LFs to identify and correct biases specific to the held-out LFs. Specifically, ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples. Evaluation on multiple datasets confirms ULF's effectiveness in enhancing WS learning without the need for manual labeling.

Related papers

Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels. By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data. The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z)
Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation [13.486951040331899]
Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. Existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. We propose Pseudo Labeling Filter (PLF) to improve the quality of pseudo-labels.
arXiv Detail & Related papers (2024-06-03T04:09:36Z)
Uncertainty-Aware Pseudo-Label Filtering for Source-Free Unsupervised Domain Adaptation [45.53185386883692]
Source-free unsupervised domain adaptation (SFUDA) aims to enable the utilization of a pre-trained source model in an unlabeled target domain without access to source data. We propose a method called Uncertainty-aware Pseudo-label-filtering Adaptation (UPA) to efficiently address this issue in a coarse-to-fine manner.
arXiv Detail & Related papers (2024-03-17T16:19:40Z)
Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks. The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z)
Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions. Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z)
Sparse Conditional Hidden Markov Model for Weakly Supervised Named Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions. Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline. It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z)
Label Augmentation with Reinforced Labeling for Weak Supervision [0.1529342790344802]
This paper proposes a new approach called reinforced labeling (RL) RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples. Experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains.
arXiv Detail & Related papers (2022-04-13T14:54:02Z)
Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming [14.639568384768042]
A critical bottleneck in supervised machine learning is the need for large amounts of labeled data. In this work, we propose an LF based reweighting framework ouralgo to solve these two critical limitations. Our algorithm learns a joint model on the (same) labeled dataset used for LF induction along with any unlabeled data in a semi-supervised manner.
arXiv Detail & Related papers (2021-09-23T14:42:46Z)
Cycle Self-Training for Domain Adaptation [85.14659717421533]
Cycle Self-Training (CST) is a principled self-training algorithm that enforces pseudo-labels to generalize across domains. CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail. Empirical results indicate that CST significantly improves over prior state-of-the-arts in standard UDA benchmarks.
arXiv Detail & Related papers (2021-03-05T10:04:25Z)
Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning. It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model. It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.