Related papers: Mitigating Data Exfiltration Attacks through Layer-Wise Learning Rate Decay Fine-Tuning

Mitigating Data Exfiltration Attacks through Layer-Wise Learning Rate Decay Fine-Tuning

URL: http://arxiv.org/abs/2509.00027v1
Date: Wed, 20 Aug 2025 09:05:01 GMT
Title: Mitigating Data Exfiltration Attacks through Layer-Wise Learning Rate Decay Fine-Tuning
Authors: Elie Thellier, Huiyu Li, Nicholas Ayache, Hervé Delingette,
Abstract summary: Data lakes enable the training of powerful machine learning models on sensitive, high-value medical datasets.<n>Recent studies show adversaries can exfiltrate training data by embedding latent representations into model parameters.<n>We propose a simple yet effective mitigation strategy that perturbs model parameters at export time through fine-tuning with a decaying layer-wise learning rate.
Score: 4.613829141527782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data lakes enable the training of powerful machine learning models on sensitive, high-value medical datasets, but also introduce serious privacy risks due to potential leakage of protected health information. Recent studies show adversaries can exfiltrate training data by embedding latent representations into model parameters or inducing memorization via multi-task learning. These attacks disguise themselves as benign utility models while enabling reconstruction of high-fidelity medical images, posing severe privacy threats with legal and ethical implications. In this work, we propose a simple yet effective mitigation strategy that perturbs model parameters at export time through fine-tuning with a decaying layer-wise learning rate to corrupt embedded data without degrading task performance. Evaluations on DermaMNIST, ChestMNIST, and MIMIC-CXR show that our approach maintains utility task performance, effectively disrupts state-of-the-art exfiltration attacks, outperforms prior defenses, and renders exfiltrated data unusable for training. Ablations and discussions on adaptive attacks highlight challenges and future directions. Our findings offer a practical defense against data leakage in data lake-trained models and centralized federated learning.

Related papers

Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection [17.369869625390894]
We propose a Metamorphosis Representation Projection (MRP) approach to machine unlearning.<n>By implementing projective transformations in the hidden state space of specific network layers, our method effectively eliminates harmful information while preserving useful knowledge.<n> Experimental results demonstrate that our approach enables effective continuous unlearning and successfully defends against relearning attacks.
arXiv Detail & Related papers (2025-08-21T11:12:09Z)
Self-Destructive Language Model [13.808746955144771]
Harmful fine-tuning attacks pose a major threat to the security of large language models (LLMs)<n>We introduce SEAM, a novel alignment-enhancing defense that transforms LLMs into self-destructive models with intrinsic resilience to misalignment attempts.
arXiv Detail & Related papers (2025-05-18T01:08:18Z)
Data Poisoning and Leakage Analysis in Federated Learning [10.090442512374661]
Data poisoning and leakage risks impede the massive deployment of federated learning in the real world. This chapter reveals the truths and pitfalls of understanding two dominating threats: em training data privacy intrusion and em training data poisoning
arXiv Detail & Related papers (2024-09-19T16:50:29Z)
Remembering Everything Makes You Vulnerable: A Limelight on Machine Unlearning for Personalized Healthcare Sector [0.873811641236639]
This thesis aims to address the vulnerability of personalized healthcare models, particularly in the context of ECG monitoring. We propose an approach termed "Machine Unlearning" to mitigate the impact of exposed data points on machine learning models.
arXiv Detail & Related papers (2024-07-05T15:38:36Z)
Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data [29.842087372804905]
This paper addresses the challenges of backdoor attack countermeasures in real-world scenarios. We propose a robust and clean-data-free backdoor defense framework, namely Mellivora Capensis (textttMeCa), which enables the model trainer to train a clean model on the poisoned dataset.
arXiv Detail & Related papers (2024-05-21T12:20:19Z)
On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model. We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation. Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.