Constrained Labeling for Weakly Supervised Learning
- URL: http://arxiv.org/abs/2009.07360v5
- Date: Sat, 29 May 2021 19:51:20 GMT
- Title: Constrained Labeling for Weakly Supervised Learning
- Authors: Chidubem Arachie, Bert Huang
- Abstract summary: We propose a simple data-free approach for combining weak supervision signals.
Our method is efficient and stable, converging after a few iterations of descent.
We show experimentally that our method outperforms other weak supervision methods on various text- and image-classification tasks.
- Score: 15.365232702938677
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Curation of large fully supervised datasets has become one of the major
roadblocks for machine learning. Weak supervision provides an alternative to
supervised learning by training with cheap, noisy, and possibly correlated
labeling functions from varying sources. The key challenge in weakly supervised
learning is combining the different weak supervision signals while navigating
misleading correlations in their errors. In this paper, we propose a simple
data-free approach for combining weak supervision signals by defining a
constrained space for the possible labels of the weak signals and training with
a random labeling within this constrained space. Our method is efficient and
stable, converging after a few iterations of gradient descent. We prove
theoretical conditions under which the worst-case error of the randomized label
decreases with the rank of the linear constraints. We show experimentally that
our method outperforms other weak supervision methods on various text- and
image-classification tasks.
Related papers
- Weakly Supervised Label Learning Flows [8.799674132085931]
We develop label learning flows (LLF), a general framework for weakly supervised learning problems.
Our method is a generative model based on normalizing flows.
Experiment results show that our method outperforms many baselines we compare against.
arXiv Detail & Related papers (2023-02-19T18:31:44Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Losses over Labels: Weakly Supervised Learning via Direct Loss
Construction [71.11337906077483]
Programmable weak supervision is a growing paradigm within machine learning.
We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label.
We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
arXiv Detail & Related papers (2022-12-13T22:29:14Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z) - Barely-Supervised Learning: Semi-Supervised Learning with very few
labeled images [16.905389887406894]
We analyze in depth the behavior of a state-of-the-art semi-supervised method, FixMatch, which relies on a weakly-augmented version of an image to obtain supervision signal.
We show that it frequently fails in barely-supervised scenarios, due to a lack of training signal when no pseudo-label can be predicted with high confidence.
We propose a method to leverage self-supervised methods that provides training signal in the absence of confident pseudo-labels.
arXiv Detail & Related papers (2021-12-22T16:29:10Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Denoising Multi-Source Weak Supervision for Neural Text Classification [9.099703420721701]
We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources.
This problem is challenging because rule-induced weak labels are often noisy and incomplete.
We design a label denoiser, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels.
arXiv Detail & Related papers (2020-10-09T13:57:52Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.