Losses over Labels: Weakly Supervised Learning via Direct Loss
Construction
- URL: http://arxiv.org/abs/2212.06921v2
- Date: Wed, 4 Oct 2023 23:32:44 GMT
- Title: Losses over Labels: Weakly Supervised Learning via Direct Loss
Construction
- Authors: Dylan Sam, J. Zico Kolter
- Abstract summary: Programmable weak supervision is a growing paradigm within machine learning.
We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label.
We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
- Score: 71.11337906077483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Owing to the prohibitive costs of generating large amounts of labeled data,
programmatic weak supervision is a growing paradigm within machine learning. In
this setting, users design heuristics that provide noisy labels for subsets of
the data. These weak labels are combined (typically via a graphical model) to
form pseudolabels, which are then used to train a downstream model. In this
work, we question a foundational premise of the typical weakly supervised
learning pipeline: given that the heuristic provides all ``label" information,
why do we need to generate pseudolabels at all? Instead, we propose to directly
transform the heuristics themselves into corresponding loss functions that
penalize differences between our model and the heuristic. By constructing
losses directly from the heuristics, we can incorporate more information than
is used in the standard weakly supervised pipeline, such as how the heuristics
make their decisions, which explicitly informs feature selection during
training. We call our method Losses over Labels (LoL) as it creates losses
directly from heuristics without going through the intermediate step of a
label. We show that LoL improves upon existing weak supervision methods on
several benchmark text and image classification tasks and further demonstrate
that incorporating gradient information leads to better performance on almost
every task.
Related papers
- Reduction-based Pseudo-label Generation for Instance-dependent Partial Label Learning [41.345794038968776]
We propose to leverage reduction-based pseudo-labels to alleviate the influence of incorrect candidate labels.
We show that reduction-based pseudo-labels exhibit greater consistency with the Bayes optimal classifier compared to pseudo-labels directly generated from the predictive model.
arXiv Detail & Related papers (2024-10-28T07:32:20Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler [89.27610526884496]
Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
arXiv Detail & Related papers (2022-11-04T02:52:54Z) - Semi-supervised Learning using Robust Loss [0.0]
We suggest a semi-supervised training strategy for leveraging both manually labeled data and extra unlabeled data.
In contrast to the existing approaches, we apply robust loss for the automated labeled data to compensate for the uneven data quality.
We show that our proposed strategy improves the model performance by compensating for the uneven quality of labels in image classification.
arXiv Detail & Related papers (2022-03-03T05:34:32Z) - Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z) - Informative Pseudo-Labeling for Graph Neural Networks with Few Labels [12.83841767562179]
Graph Neural Networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs.
The challenge of how to effectively learn GNNs with very few labels is still under-explored.
We propose a novel informative pseudo-labeling framework, called InfoGNN, to facilitate learning of GNNs with extremely few labels.
arXiv Detail & Related papers (2022-01-20T01:49:30Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.