Vulnerability of Transfer-Learned Neural Networks to Data Reconstruction Attacks in Small-Data Regime
- URL: http://arxiv.org/abs/2505.14323v1
- Date: Tue, 20 May 2025 13:09:22 GMT
- Title: Vulnerability of Transfer-Learned Neural Networks to Data Reconstruction Attacks in Small-Data Regime
- Authors: Tomasz Maciążek, Robert Allison,
- Abstract summary: Training data reconstruction attacks enable adversaries to recover portions of a released model's training data.<n>We consider the attacks where a reconstructor neural network learns to invert the (random) mapping between training data and model weights.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training data reconstruction attacks enable adversaries to recover portions of a released model's training data. We consider the attacks where a reconstructor neural network learns to invert the (random) mapping between training data and model weights. Prior work has shown that an informed adversary with access to released model's weights and all but one training data point can achieve high-quality reconstructions in this way. However, differential privacy can defend against such an attack with little to no loss in model's utility when the amount of training data is sufficiently large. In this work we consider a more realistic adversary who only knows the distribution from which a small training dataset has been sampled and who attacks a transfer-learned neural network classifier that has been trained on this dataset. We exhibit an attack that works in this realistic threat model and demonstrate that in the small-data regime it cannot be defended against by DP-SGD without severely damaging the classifier accuracy. This raises significant concerns about the use of such transfer-learned classifiers when protection of training-data is paramount. We demonstrate the effectiveness and robustness of our attack on VGG, EfficientNet and ResNet image classifiers transfer-learned on MNIST, CIFAR-10 and CelebA respectively. Additionally, we point out that the commonly used (true-positive) reconstruction success rate metric fails to reliably quantify the actual reconstruction effectiveness. Instead, we make use of the Neyman-Pearson lemma to construct the receiver operating characteristic curve and consider the associated true-positive reconstruction rate at a fixed level of the false-positive reconstruction rate.
Related papers
- SpooFL: Spoofing Federated Learning [54.05993847488204]
We introduce a spoofing-based defense that deceives attackers into believing they have recovered the true training data.<n>Unlike prior synthetic-data defenses that share classes or distributions with the private data, SFL uses a state-of-the-art generative model trained on an external dataset.<n>As a result, attackers are misled into recovering plausible yet completely irrelevant samples, preventing meaningful data leakage.
arXiv Detail & Related papers (2026-01-21T14:57:18Z) - Defending Against Neural Network Model Inversion Attacks via Data Poisoning [15.099559883494475]
Model inversion attacks pose a significant privacy threat to machine learning models.<n>This paper introduces a novel defense mechanism to better balance privacy and utility.<n>We propose a strategy that leverages data poisoning to contaminate the training data of inversion models.
arXiv Detail & Related papers (2024-12-10T15:08:56Z) - Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning [4.533760678036969]
Federated learning is a decentralized learning paradigm introduced to preserve privacy of client data.
Prior work has shown that an attacker can still reconstruct the private training data using only the client updates.
We explore data reconstruction attacks through the lens of training and improve models with leaked data.
arXiv Detail & Related papers (2024-03-26T23:05:24Z) - Adaptive Domain Inference Attack with Concept Hierarchy [4.772368796656325]
Most known model-targeted attacks assume attackers have learned the application domain or training data distribution.<n>Can removing the domain information from model APIs protect models from these attacks?<n>We show that the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data.
arXiv Detail & Related papers (2023-12-22T22:04:13Z) - Post-train Black-box Defense via Bayesian Boundary Correction [9.769249984262958]
We propose a new post-train black-box defense framework for deep neural networks.
It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics.
It is further equipped with a new post-train strategy keeps the victim intact, avoiding re-training.
arXiv Detail & Related papers (2023-06-29T14:33:20Z) - Boosting Model Inversion Attacks with Adversarial Examples [26.904051413441316]
We propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting.
First, we regularize the training process of the attack model with an added semantic loss function.
Second, we inject adversarial examples into the training data to increase the diversity of the class-related parts.
arXiv Detail & Related papers (2023-06-24T13:40:58Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Understanding Reconstruction Attacks with the Neural Tangent Kernel and
Dataset Distillation [110.61853418925219]
We build a stronger version of the dataset reconstruction attack and show how it can provably recover the emphentire training set in the infinite width regime.
We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset.
These reconstruction attacks can be used for textitdataset distillation, that is, we can retrain on reconstructed images and obtain high predictive accuracy.
arXiv Detail & Related papers (2023-02-02T21:41:59Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Reconstructing Training Data with Informed Adversaries [30.138217209991826]
Given access to a machine learning model, can an adversary reconstruct the model's training data?
This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one.
We show it is feasible to reconstruct the remaining data point in this stringent threat model.
arXiv Detail & Related papers (2022-01-13T09:19:25Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - GRNN: Generative Regression Neural Network -- A Data Leakage Attack for
Federated Learning [3.050919759387984]
We show that image-based privacy data can be easily recovered in full from the shared gradient only via our proposed Generative Regression Neural Network (GRNN)
We evaluate our method on several image classification tasks. The results illustrate that our proposed GRNN outperforms state-of-the-art methods with better stability, stronger, and higher accuracy.
arXiv Detail & Related papers (2021-05-02T18:39:37Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Exploring the Security Boundary of Data Reconstruction via Neuron
Exclusivity Analysis [23.07323180340961]
We study the security boundary of data reconstruction from gradient via a microcosmic view on neural networks with rectified linear units (ReLUs)
We construct a novel deterministic attack algorithm which substantially outperforms previous attacks for reconstructing training batches lying in the insecure boundary of a neural network.
arXiv Detail & Related papers (2020-10-26T05:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.