Securing Transfer-Learned Networks with Reverse Homomorphic Encryption
- URL: http://arxiv.org/abs/2505.14323v2
- Date: Mon, 27 Oct 2025 19:24:41 GMT
- Title: Securing Transfer-Learned Networks with Reverse Homomorphic Encryption
- Authors: Robert Allison, Tomasz Maciążek, Henry Bourne,
- Abstract summary: We show that differentially private (DP) training can defend against training-data reconstruction attacks.<n>We propose a novel homomorphic encryption (HE) method that protects training data without degrading model's accuracy.
- Score: 0.764671395172401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing body of literature on training-data reconstruction attacks raises significant concerns about deploying neural network classifiers trained on sensitive data. However, differentially private (DP) training (e.g. using DP-SGD) can defend against such attacks with large training datasets causing only minimal loss of network utility. Folklore, heuristics, and (albeit pessimistic) DP bounds suggest this fails for networks trained with small per-class datasets, yet to the best of our knowledge the literature offers no compelling evidence. We directly demonstrate this vulnerability by significantly extending reconstruction attack capabilities under a realistic adversary threat model for few-shot transfer learned image classifiers. We design new white-box and black-box attacks and find that DP-SGD is unable to defend against these without significant classifier utility loss. To address this, we propose a novel homomorphic encryption (HE) method that protects training data without degrading model's accuracy. Conventional HE secures model's input data and requires costly homomorphic implementation of the entire classifier. In contrast, our new scheme is computationally efficient and protects training data rather than input data. This is achieved by means of a simple role-reversal where classifier input data is unencrypted but transfer-learned weights are encrypted. Classifier outputs remain encrypted, thus preventing both white-box and black-box (and any other) training-data reconstruction attacks. Under this new scheme only a trusted party with a private decryption key can obtain the classifier class decisions.
Related papers
- SpooFL: Spoofing Federated Learning [54.05993847488204]
We introduce a spoofing-based defense that deceives attackers into believing they have recovered the true training data.<n>Unlike prior synthetic-data defenses that share classes or distributions with the private data, SFL uses a state-of-the-art generative model trained on an external dataset.<n>As a result, attackers are misled into recovering plausible yet completely irrelevant samples, preventing meaningful data leakage.
arXiv Detail & Related papers (2026-01-21T14:57:18Z) - Defending Against Neural Network Model Inversion Attacks via Data Poisoning [15.099559883494475]
Model inversion attacks pose a significant privacy threat to machine learning models.<n>This paper introduces a novel defense mechanism to better balance privacy and utility.<n>We propose a strategy that leverages data poisoning to contaminate the training data of inversion models.
arXiv Detail & Related papers (2024-12-10T15:08:56Z) - Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning [4.533760678036969]
Federated learning is a decentralized learning paradigm introduced to preserve privacy of client data.
Prior work has shown that an attacker can still reconstruct the private training data using only the client updates.
We explore data reconstruction attacks through the lens of training and improve models with leaked data.
arXiv Detail & Related papers (2024-03-26T23:05:24Z) - Adaptive Domain Inference Attack with Concept Hierarchy [4.772368796656325]
Most known model-targeted attacks assume attackers have learned the application domain or training data distribution.<n>Can removing the domain information from model APIs protect models from these attacks?<n>We show that the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data.
arXiv Detail & Related papers (2023-12-22T22:04:13Z) - Post-train Black-box Defense via Bayesian Boundary Correction [9.769249984262958]
We propose a new post-train black-box defense framework for deep neural networks.
It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics.
It is further equipped with a new post-train strategy keeps the victim intact, avoiding re-training.
arXiv Detail & Related papers (2023-06-29T14:33:20Z) - Boosting Model Inversion Attacks with Adversarial Examples [26.904051413441316]
We propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting.
First, we regularize the training process of the attack model with an added semantic loss function.
Second, we inject adversarial examples into the training data to increase the diversity of the class-related parts.
arXiv Detail & Related papers (2023-06-24T13:40:58Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Understanding Reconstruction Attacks with the Neural Tangent Kernel and
Dataset Distillation [110.61853418925219]
We build a stronger version of the dataset reconstruction attack and show how it can provably recover the emphentire training set in the infinite width regime.
We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset.
These reconstruction attacks can be used for textitdataset distillation, that is, we can retrain on reconstructed images and obtain high predictive accuracy.
arXiv Detail & Related papers (2023-02-02T21:41:59Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Reconstructing Training Data with Informed Adversaries [30.138217209991826]
Given access to a machine learning model, can an adversary reconstruct the model's training data?
This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one.
We show it is feasible to reconstruct the remaining data point in this stringent threat model.
arXiv Detail & Related papers (2022-01-13T09:19:25Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - GRNN: Generative Regression Neural Network -- A Data Leakage Attack for
Federated Learning [3.050919759387984]
We show that image-based privacy data can be easily recovered in full from the shared gradient only via our proposed Generative Regression Neural Network (GRNN)
We evaluate our method on several image classification tasks. The results illustrate that our proposed GRNN outperforms state-of-the-art methods with better stability, stronger, and higher accuracy.
arXiv Detail & Related papers (2021-05-02T18:39:37Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Exploring the Security Boundary of Data Reconstruction via Neuron
Exclusivity Analysis [23.07323180340961]
We study the security boundary of data reconstruction from gradient via a microcosmic view on neural networks with rectified linear units (ReLUs)
We construct a novel deterministic attack algorithm which substantially outperforms previous attacks for reconstructing training batches lying in the insecure boundary of a neural network.
arXiv Detail & Related papers (2020-10-26T05:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.