Private Semi-supervised Knowledge Transfer for Deep Learning from Noisy
Labels
- URL: http://arxiv.org/abs/2211.01628v1
- Date: Thu, 3 Nov 2022 07:33:49 GMT
- Title: Private Semi-supervised Knowledge Transfer for Deep Learning from Noisy
Labels
- Authors: Qiuchen Zhang, Jing Ma, Jian Lou, Li Xiong, and Xiaoqian Jiang
- Abstract summary: We propose PATE++, which combines the current advanced noisy label training mechanisms with the original PATE framework to enhance its accuracy.
A novel structure of Generative Adversarial Nets (GANs) is developed in order to integrate them effectively.
In addition, we develop a novel noisy label detection mechanism for semi-supervised model training to further improve student model performance when training with noisy labels.
- Score: 21.374707481630697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models trained on large-scale data have achieved encouraging
performance in many real-world tasks. Meanwhile, publishing those models
trained on sensitive datasets, such as medical records, could pose serious
privacy concerns. To counter these issues, one of the current state-of-the-art
approaches is the Private Aggregation of Teacher Ensembles, or PATE, which
achieved promising results in preserving the utility of the model while
providing a strong privacy guarantee. PATE combines an ensemble of "teacher
models" trained on sensitive data and transfers the knowledge to a "student"
model through the noisy aggregation of teachers' votes for labeling unlabeled
public data which the student model will be trained on. However, the knowledge
or voted labels learned by the student are noisy due to private aggregation.
Learning directly from noisy labels can significantly impact the accuracy of
the student model.
In this paper, we propose the PATE++ mechanism, which combines the current
advanced noisy label training mechanisms with the original PATE framework to
enhance its accuracy. A novel structure of Generative Adversarial Nets (GANs)
is developed in order to integrate them effectively. In addition, we develop a
novel noisy label detection mechanism for semi-supervised model training to
further improve student model performance when training with noisy labels. We
evaluate our method on Fashion-MNIST and SVHN to show the improvements on the
original PATE on all measures.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Self-Regulated Data-Free Knowledge Amalgamation for Text Classification [9.169836450935724]
We develop a lightweight student network that can learn from multiple teacher models without accessing their original training data.
To accomplish this, we propose STRATANET, a modeling framework that produces text data tailored to each teacher.
We evaluate our method on three benchmark text classification datasets with varying labels or domains.
arXiv Detail & Related papers (2024-06-16T21:13:30Z) - Few Shot Rationale Generation using Self-Training with Dual Teachers [4.91890875296663]
Self-rationalizing models that also generate a free-text explanation for their predicted labels are an important tool to build trustworthy AI applications.
We introduce a novel dual-teacher learning framework, which learns two specialized teacher models for task prediction and rationalization.
We formulate a new loss function, Masked Label Regularization (MLR) which promotes explanations to be strongly conditioned on predicted labels.
arXiv Detail & Related papers (2023-06-05T23:57:52Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling
to Differential Privacy Preserving Speech Recognition [51.20130423303659]
We propose an ensemble learning framework with Poisson sub-sampling to train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.
Through boosting under DP, a student model derived from the training data suffers little model degradation from the models trained with no privacy protection.
Our proposed solution leverages upon two mechanisms, namely: (i) a privacy budget amplification via Poisson sub-sampling to train a target prediction model that requires less noise to achieve a same level of privacy budget, and (ii) a combination of the sub-sampling technique and an ensemble teacher-student learning framework.
arXiv Detail & Related papers (2022-10-12T16:34:08Z) - ALM-KD: Knowledge Distillation with noisy labels via adaptive loss
mixing [25.49637460661711]
Knowledge distillation is a technique where the outputs of a pretrained model are used for training a student model in a supervised setting.
We tackle this problem via the use of an adaptive loss mixing scheme during KD.
We demonstrate performance gains obtained using our approach in the standard KD setting as well as in multi-teacher and self-distillation settings.
arXiv Detail & Related papers (2022-02-07T14:53:22Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Deep k-NN for Noisy Labels [55.97221021252733]
We show that a simple $k$-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled data and produce more accurate models than many recently proposed methods.
arXiv Detail & Related papers (2020-04-26T05:15:36Z) - Differentially Private Deep Learning with Smooth Sensitivity [144.31324628007403]
We study privacy concerns through the lens of differential privacy.
In this framework, privacy guarantees are generally obtained by perturbing models in such a way that specifics of data used to train the model are made ambiguous.
One of the most important techniques used in previous works involves an ensemble of teacher models, which return information to a student based on a noisy voting procedure.
In this work, we propose a novel voting mechanism with smooth sensitivity, which we call Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student
arXiv Detail & Related papers (2020-03-01T15:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.