Learned Label Aggregation for Weak Supervision
- URL: http://arxiv.org/abs/2207.13545v1
- Date: Wed, 27 Jul 2022 14:36:35 GMT
- Title: Learned Label Aggregation for Weak Supervision
- Authors: Renzhi Wu, Shen-En Chen, Xu Chu
- Abstract summary: We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily.
The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels.
We show the model can be trained using synthetically generated data and design an effective architecture for the model.
- Score: 8.819582879892762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lack of labeled training data is the bottleneck of machine learning in
many applications. To resolve the bottleneck, one promising direction is the
data programming approach that aggregates different sources of weak supervision
signals to generate labeled data easily. Data programming encodes each weak
supervision source with a labeling function (LF), a user-provided program that
predicts noisy labels. The quality of the generated labels depends on a label
aggregation model that aggregates all noisy labels from all LFs to infer the
ground-truth labels.
Existing label aggregation methods typically rely on various assumptions and
are not robust across datasets, as we will show empirically. We for the first
time provide an analytical label aggregation method that makes minimum
assumption and is optimal in minimizing a certain form of the averaged
prediction error. Since the complexity of the analytical form is exponential,
we train a model that learns to be the analytical method. Once trained, the
model can be used for any unseen datasets and the model predicts the
ground-truth labels for each dataset in a single forward pass in linear time.
We show the model can be trained using synthetically generated data and design
an effective architecture for the model. On 14 real-world datasets, our model
significantly outperforms the best existing methods in both accuracy (by 3.5
points on average) and efficiency (by six times on average).
Related papers
- Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Deep Partial Multi-Label Learning with Graph Disambiguation [27.908565535292723]
We propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN)
Specifically, we introduce the instance-level and label-level similarities to recover label confidences.
At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels.
arXiv Detail & Related papers (2023-05-10T04:02:08Z) - Ground Truth Inference for Weakly Supervised Entity Matching [76.6732856489872]
We propose a simple but powerful labeling model for weak supervision tasks.
We then tailor the labeling model specifically to the task of entity matching.
We show that our labeling model results in a 9% higher F1 score on average than the best existing method.
arXiv Detail & Related papers (2022-11-13T17:57:07Z) - Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Self-Supervised Noisy Label Learning for Source-Free Unsupervised Domain
Adaptation [87.60688582088194]
We propose a novel Self-Supervised Noisy Label Learning method.
Our method can easily achieve state-of-the-art results and surpass other methods by a very large margin.
arXiv Detail & Related papers (2021-02-23T10:51:45Z) - Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z) - Unsupervised Pool-Based Active Learning for Linear Regression [29.321275647107928]
This paper studies unsupervised pool-based AL for linear regression problems.
We propose a novel AL approach that considers simultaneously the informativeness, representativeness, and diversity, three essential criteria in AL.
arXiv Detail & Related papers (2020-01-14T20:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.