Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses
- URL: http://arxiv.org/abs/2502.15567v1
- Date: Fri, 21 Feb 2025 16:29:11 GMT
- Title: Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses
- Authors: Ganghua Wang, Yuhong Yang, Jie Ding,
- Abstract summary: This work presents a framework called Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses.<n>We propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models.
- Score: 11.939472526374246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of machine learning (ML) has become increasingly prevalent in various domains, highlighting the importance of understanding and ensuring its safety. One pressing concern is the vulnerability of ML applications to model stealing attacks. These attacks involve adversaries attempting to recover a learned model through limited query-response interactions, such as those found in cloud-based services or on-chip artificial intelligence interfaces. While existing literature proposes various attack and defense strategies, these often lack a theoretical foundation and standardized evaluation criteria. In response, this work presents a framework called ``Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses. We establish a rigorous formulation for the threat model and objectives, propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models. Our developed theory offers valuable insights into enhancing the security of ML models, especially highlighting the importance of the attack-specific structure of perturbations for effective defenses. We demonstrate the application of model privacy from the defender's perspective through various learning scenarios. Extensive experiments corroborate the insights and the effectiveness of defense mechanisms developed under the proposed framework.
Related papers
- A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments [55.60375624503877]
Model Extraction Attacks (MEAs) threaten modern machine learning systems by enabling adversaries to steal models, exposing intellectual property and training data.
This survey is motivated by the urgent need to understand how the unique characteristics of cloud, edge, and federated deployments shape attack vectors and defense requirements.
We systematically examine the evolution of attack methodologies and defense mechanisms across these environments, demonstrating how environmental factors influence security strategies in critical sectors such as autonomous vehicles, healthcare, and financial services.
arXiv Detail & Related papers (2025-02-22T03:46:50Z) - Safety at Scale: A Comprehensive Survey of Large Model Safety [299.801463557549]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats.<n>We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z) - CALoR: Towards Comprehensive Model Inversion Defense [43.2642796582236]
Model Inversion Attacks (MIAs) aim at recovering privacy-sensitive training data from the knowledge encoded in released machine learning models.
Recent advances in the MIA field have significantly enhanced the attack performance under multiple scenarios.
We propose a robust defense mechanism, integrating Confidence Adaptation and Low-Rank compression.
arXiv Detail & Related papers (2024-10-08T08:44:01Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [18.624280305864804]
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP)
This paper presents a comprehensive survey of the various forms of attacks targeting LLMs.
We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.
arXiv Detail & Related papers (2024-03-03T04:46:21Z) - SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models.
Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z) - A Framework for Understanding Model Extraction Attack and Defense [48.421636548746704]
We study tradeoffs between model utility from a benign user's view and privacy from an adversary's view.
We develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies.
arXiv Detail & Related papers (2022-06-23T05:24:52Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Improving Robustness to Model Inversion Attacks via Mutual Information
Regularization [12.079281416410227]
This paper studies defense mechanisms against model inversion (MI) attacks.
MI is a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
We propose the Mutual Information Regularization based Defense (MID) against MI attacks.
arXiv Detail & Related papers (2020-09-11T06:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.