Secure Confidential Business Information When Sharing Machine Learning Models
- URL: http://arxiv.org/abs/2509.16352v1
- Date: Fri, 19 Sep 2025 18:49:49 GMT
- Title: Secure Confidential Business Information When Sharing Machine Learning Models
- Authors: Yunfan Yang, Jiarong Xu, Hongzhe Zhang, Xiao Fang,
- Abstract summary: Confidential Property Inference (CPI) attacks can exploit shared Machine Learning (ML) models to uncover confidential properties of the model provider's private model training data.<n>Existing defenses often assume that CPI attacks are non-adaptive to the specific ML model they are targeting.<n>We propose a novel defense method that explicitly accounts for the responsive nature of real-world adversaries.
- Score: 15.330821160739232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-sharing offers significant business value by enabling firms with well-established Machine Learning (ML) models to monetize and share their models with others who lack the resources to develop ML models from scratch. However, concerns over data confidentiality remain a significant barrier to model-sharing adoption, as Confidential Property Inference (CPI) attacks can exploit shared ML models to uncover confidential properties of the model provider's private model training data. Existing defenses often assume that CPI attacks are non-adaptive to the specific ML model they are targeting. This assumption overlooks a key characteristic of real-world adversaries: their responsiveness, i.e., adversaries' ability to dynamically adjust their attack models based on the information of the target and its defenses. To overcome this limitation, we propose a novel defense method that explicitly accounts for the responsive nature of real-world adversaries via two methodological innovations: a novel Responsive CPI attack and an attack-defense arms race framework. The former emulates the responsive behaviors of adversaries in the real world, and the latter iteratively enhances both the target and attack models, ultimately producing a secure ML model that is robust against responsive CPI attacks. Furthermore, we propose and integrate a novel approximate strategy into our defense, which addresses a critical computational bottleneck of defense methods and improves defense efficiency. Through extensive empirical evaluations across various realistic model-sharing scenarios, we demonstrate that our method outperforms existing defenses by more effectively defending against CPI attacks, preserving ML model utility, and reducing computational overhead.
Related papers
- A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z) - A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses [11.939472526374246]
This work presents a framework called Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses.<n>We propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models.
arXiv Detail & Related papers (2025-02-21T16:29:11Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Ownership Protection of Generative Adversarial Networks [9.355840335132124]
Generative adversarial networks (GANs) have shown remarkable success in image synthesis.
It is critical to technically protect the intellectual property of GANs.
We propose a new ownership protection method based on the common characteristics of a target model and its stolen models.
arXiv Detail & Related papers (2023-06-08T14:31:58Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - I Know What You Trained Last Summer: A Survey on Stealing Machine
Learning Models and Defences [0.1031296820074812]
We study model stealing attacks, assessing their performance and exploring corresponding defence techniques in different settings.
We propose a taxonomy for attack and defence approaches, and provide guidelines on how to select the right attack or defence based on the goal and available resources.
arXiv Detail & Related papers (2022-06-16T21:16:41Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Improving Robustness to Model Inversion Attacks via Mutual Information
Regularization [12.079281416410227]
This paper studies defense mechanisms against model inversion (MI) attacks.
MI is a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
We propose the Mutual Information Regularization based Defense (MID) against MI attacks.
arXiv Detail & Related papers (2020-09-11T06:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.