Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning
- URL: http://arxiv.org/abs/2207.12535v1
- Date: Mon, 25 Jul 2022 21:17:24 GMT
- Title: Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning
- Authors: Xinlei He and Hongbin Liu and Neil Zhenqiang Gong and Yang Zhang
- Abstract summary: Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models.
We propose the first data augmentation-based membership inference attacks against ML models trained by SSL.
Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks.
- Score: 42.089020844936805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semi-supervised learning (SSL) leverages both labeled and unlabeled data to
train machine learning (ML) models. State-of-the-art SSL methods can achieve
comparable performance to supervised learning by leveraging much fewer labeled
data. However, most existing works focus on improving the performance of SSL.
In this work, we take a different angle by studying the training data privacy
of SSL. Specifically, we propose the first data augmentation-based membership
inference attacks against ML models trained by SSL. Given a data sample and the
black-box access to a model, the goal of membership inference attack is to
determine whether the data sample belongs to the training dataset of the model.
Our evaluation shows that the proposed attack can consistently outperform
existing membership inference attacks and achieves the best performance against
the model trained by SSL. Moreover, we uncover that the reason for membership
leakage in SSL is different from the commonly believed one in supervised
learning, i.e., overfitting (the gap between training and testing accuracy). We
observe that the SSL model is well generalized to the testing data (with almost
0 overfitting) but ''memorizes'' the training data by giving a more confident
prediction regardless of its correctness. We also explore early stopping as a
countermeasure to prevent membership inference attacks against SSL. The results
show that early stopping can mitigate the membership inference attack, but with
the cost of model's utility degradation.
Related papers
- A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA)
ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context.
We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z) - Reinforcement Learning-Guided Semi-Supervised Learning [20.599506122857328]
We propose a novel Reinforcement Learning Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem.
RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance.
We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.
arXiv Detail & Related papers (2024-05-02T21:52:24Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and
Consistency [30.94964727745347]
We propose a method called Active Semi-supervised Learning (ASSL) to improve accuracy of models at a lower cost.
ASSL involves more dynamic model updates than Active Learning (AL) due to the use of unlabeled data.
ASSL achieved about 5.3 times higher computational efficiency than Semi-supervised Learning (SSL) while achieving the same performance.
arXiv Detail & Related papers (2023-03-15T22:58:23Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - On Higher Adversarial Susceptibility of Contrastive Self-Supervised
Learning [104.00264962878956]
Contrastive self-supervised learning (CSL) has managed to match or surpass the performance of supervised learning in image and video classification.
It is still largely unknown if the nature of the representation induced by the two learning paradigms is similar.
We identify the uniform distribution of data representation over a unit hypersphere in the CSL representation space as the key contributor to this phenomenon.
We devise strategies that are simple, yet effective in improving model robustness with CSL training.
arXiv Detail & Related papers (2022-07-22T03:49:50Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Semi-supervised learning objectives as log-likelihoods in a generative
model of data curation [32.45282187405337]
We formulate SSL objectives as a log-likelihood in a generative model of data curation.
We give a proof-of-principle for Bayesian SSL on toy data.
arXiv Detail & Related papers (2020-08-13T13:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.