Data Consistency for Weakly Supervised Learning
- URL: http://arxiv.org/abs/2202.03987v1
- Date: Tue, 8 Feb 2022 16:48:19 GMT
- Title: Data Consistency for Weakly Supervised Learning
- Authors: Chidubem Arachie, Bert Huang
- Abstract summary: Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
- Score: 15.365232702938677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many applications, training machine learning models involves using large
amounts of human-annotated data. Obtaining precise labels for the data is
expensive. Instead, training with weak supervision provides a low-cost
alternative. We propose a novel weak supervision algorithm that processes noisy
labels, i.e., weak signals, while also considering features of the training
data to produce accurate labels for training. Our method searches over
classifiers of the data representation to find plausible labelings. We call
this paradigm data consistent weak supervision. A key facet of our framework is
that we are able to estimate labels for data examples low or no coverage from
the weak supervision. In addition, we make no assumptions about the joint
distribution of the weak signals and true labels of the data. Instead, we use
weak signals and the data features to solve a constrained optimization that
enforces data consistency among the labels we generate. Empirical evaluation of
our method on different datasets shows that it significantly outperforms
state-of-the-art weak supervision methods on both text and image classification
tasks.
Related papers
- Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Losses over Labels: Weakly Supervised Learning via Direct Loss
Construction [71.11337906077483]
Programmable weak supervision is a growing paradigm within machine learning.
We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label.
We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
arXiv Detail & Related papers (2022-12-13T22:29:14Z) - Learned Label Aggregation for Weak Supervision [8.819582879892762]
We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily.
The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels.
We show the model can be trained using synthetically generated data and design an effective architecture for the model.
arXiv Detail & Related papers (2022-07-27T14:36:35Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - Training Subset Selection for Weak Supervision [17.03788288165262]
We show a tradeoff between the amount of weakly-labeled data and the precision of the weak labels.
We combine pretrained data representations with the cut statistic to select high-quality subsets of the weakly-labeled training data.
Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.
arXiv Detail & Related papers (2022-06-06T21:31:32Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Meta-Learning for Neural Relation Classification with Distant
Supervision [38.755055486296435]
We propose a meta-learning based approach, which learns to reweight noisy training data under the guidance of reference data.
Experiments on several datasets demonstrate that the reference data can effectively guide the selection of training data.
arXiv Detail & Related papers (2020-10-26T12:52:28Z) - Constrained Labeling for Weakly Supervised Learning [15.365232702938677]
We propose a simple data-free approach for combining weak supervision signals.
Our method is efficient and stable, converging after a few iterations of descent.
We show experimentally that our method outperforms other weak supervision methods on various text- and image-classification tasks.
arXiv Detail & Related papers (2020-09-15T21:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.