Labels, Information, and Computation: Efficient, Privacy-Preserving
Learning Using Sufficient Labels
- URL: http://arxiv.org/abs/2104.09015v1
- Date: Mon, 19 Apr 2021 02:15:25 GMT
- Title: Labels, Information, and Computation: Efficient, Privacy-Preserving
Learning Using Sufficient Labels
- Authors: Shiyu Duan and Jose C. Principe
- Abstract summary: We show that we do not always need full label information on every single training example.
We call this statistic "sufficiently-labeled data" and prove its sufficiency and efficiency.
sufficiently-labeled data naturally preserves user privacy by storing relative, instead of absolute, information.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In supervised learning, obtaining a large set of fully-labeled training data
is expensive. We show that we do not always need full label information on
every single training example to train a competent classifier. Specifically,
inspired by the principle of sufficiency in statistics, we present a statistic
(a summary) of the fully-labeled training set that captures almost all the
relevant information for classification but at the same time is easier to
obtain directly. We call this statistic "sufficiently-labeled data" and prove
its sufficiency and efficiency for finding the optimal hidden representations,
on which competent classifier heads can be trained using as few as a single
randomly-chosen fully-labeled example per class. Sufficiently-labeled data can
be obtained from annotators directly without collecting the fully-labeled data
first. And we prove that it is easier to directly obtain sufficiently-labeled
data than obtaining fully-labeled data. Furthermore, sufficiently-labeled data
naturally preserves user privacy by storing relative, instead of absolute,
information. Extensive experimental results are provided to support our theory.
Related papers
- FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Doubly Robust Self-Training [46.168395767948965]
We introduce doubly robust self-training, a novel semi-supervised algorithm.
We demonstrate the superiority of the doubly robust loss over the standard self-training baseline.
arXiv Detail & Related papers (2023-06-01T00:57:16Z) - Q-Match: Self-supervised Learning by Matching Distributions Induced by a
Queue [6.1678491628787455]
We introduce our algorithm, Q-Match, and show it is possible to induce the student-teacher distributions without any knowledge of downstream classes.
We show that our method is sample efficient--in terms of both the labels required for downstream training and the amount of unlabeled data required for pre-training--and scales well to the sizes of both the labeled and unlabeled data.
arXiv Detail & Related papers (2023-02-10T18:59:05Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Unsupervised Selective Labeling for More Effective Semi-Supervised
Learning [46.414510522978425]
unsupervised selective labeling consistently improves SSL methods over state-of-the-art active learning given labeled data.
Our work sets a new standard for practical and efficient SSL.
arXiv Detail & Related papers (2021-10-06T18:25:50Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - Self-semi-supervised Learning to Learn from NoisyLabeled Data [3.18577806302116]
It is costly to obtain high-quality human-labeled data, leading to the active research area of training models robust to noisy labels.
In this project, we designed methods to more accurately differentiate clean and noisy labels and borrowed the wisdom of self-semi-supervised learning to train noisy labeled data.
arXiv Detail & Related papers (2020-11-03T02:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.