Related papers: Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

URL: http://arxiv.org/abs/2207.12535v1
Date: Mon, 25 Jul 2022 21:17:24 GMT
Title: Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning
Authors: Xinlei He and Hongbin Liu and Neil Zhenqiang Gong and Yang Zhang
Abstract summary: Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. We propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks.
Score: 42.089020844936805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. State-of-the-art SSL methods can achieve comparable performance to supervised learning by leveraging much fewer labeled data. However, most existing works focus on improving the performance of SSL. In this work, we take a different angle by studying the training data privacy of SSL. Specifically, we propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Given a data sample and the black-box access to a model, the goal of membership inference attack is to determine whether the data sample belongs to the training dataset of the model. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks and achieves the best performance against the model trained by SSL. Moreover, we uncover that the reason for membership leakage in SSL is different from the commonly believed one in supervised learning, i.e., overfitting (the gap between training and testing accuracy). We observe that the SSL model is well generalized to the testing data (with almost 0 overfitting) but ''memorizes'' the training data by giving a more confident prediction regardless of its correctness. We also explore early stopping as a countermeasure to prevent membership inference attacks against SSL. The results show that early stopping can mitigate the membership inference attack, but with the cost of model's utility degradation.

Related papers

FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data [36.21759320898034]
Semi-supervised learning (SSL) has achieved significant progress by leveraging both labeled data and unlabeled data. We propose Firstly Adapt, Then catEgorize (FATE), a novel SSL framework tailored for scenarios with extremely limited labeled data. FATE exploits unlabeled data to compensate for scarce supervision signals, then transfers to downstream tasks.
arXiv Detail & Related papers (2025-04-14T02:54:28Z)
Revisiting semi-supervised learning in the era of foundation models [28.414667991336067]
Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. We develop new SSL benchmark datasets where frozen vision foundation models (VFMs) underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels.
arXiv Detail & Related papers (2025-03-12T18:01:10Z)
Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning [0.48933451909251774]
Self-supervised learning has revolutionized learning from large-scale unlabeled datasets. Introductory relationship between pretraining data and learned representations remains poorly understood. We introduce Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL.
arXiv Detail & Related papers (2024-12-22T21:43:56Z)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z)
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z)
Reinforcement Learning-Guided Semi-Supervised Learning [20.599506122857328]
We propose a novel Reinforcement Learning Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem. RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance. We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.
arXiv Detail & Related papers (2024-05-02T21:52:24Z)
Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model. Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data. We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z)
Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and Consistency [30.94964727745347]
We propose a method called Active Semi-supervised Learning (ASSL) to improve accuracy of models at a lower cost. ASSL involves more dynamic model updates than Active Learning (AL) due to the use of unlabeled data. ASSL achieved about 5.3 times higher computational efficiency than Semi-supervised Learning (SSL) while achieving the same performance.
arXiv Detail & Related papers (2023-03-15T22:58:23Z)
Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z)
On Higher Adversarial Susceptibility of Contrastive Self-Supervised Learning [104.00264962878956]
Contrastive self-supervised learning (CSL) has managed to match or surpass the performance of supervised learning in image and video classification. It is still largely unknown if the nature of the representation induced by the two learning paradigms is similar. We identify the uniform distribution of data representation over a unit hypersphere in the CSL representation space as the key contributor to this phenomenon. We devise strategies that are simple, yet effective in improving model robustness with CSL training.
arXiv Detail & Related papers (2022-07-22T03:49:50Z)
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data. We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning. Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z)
Semi-supervised learning objectives as log-likelihoods in a generative model of data curation [32.45282187405337]
We formulate SSL objectives as a log-likelihood in a generative model of data curation. We give a proof-of-principle for Bayesian SSL on toy data.
arXiv Detail & Related papers (2020-08-13T13:50:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.