Related papers: Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses

Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses

URL: http://arxiv.org/abs/2502.15567v1
Date: Fri, 21 Feb 2025 16:29:11 GMT
Title: Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses
Authors: Ganghua Wang, Yuhong Yang, Jie Ding,
Abstract summary: This work presents a framework called Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses.<n>We propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models.
Score: 11.939472526374246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of machine learning (ML) has become increasingly prevalent in various domains, highlighting the importance of understanding and ensuring its safety. One pressing concern is the vulnerability of ML applications to model stealing attacks. These attacks involve adversaries attempting to recover a learned model through limited query-response interactions, such as those found in cloud-based services or on-chip artificial intelligence interfaces. While existing literature proposes various attack and defense strategies, these often lack a theoretical foundation and standardized evaluation criteria. In response, this work presents a framework called ``Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses. We establish a rigorous formulation for the threat model and objectives, propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models. Our developed theory offers valuable insights into enhancing the security of ML models, especially highlighting the importance of the attack-specific structure of perturbations for effective defenses. We demonstrate the application of model privacy from the defender's perspective through various learning scenarios. Extensive experiments corroborate the insights and the effectiveness of defense mechanisms developed under the proposed framework.

Related papers

Secure Confidential Business Information When Sharing Machine Learning Models [15.330821160739232]
Confidential Property Inference (CPI) attacks can exploit shared Machine Learning (ML) models to uncover confidential properties of the model provider's private model training data.<n>Existing defenses often assume that CPI attacks are non-adaptive to the specific ML model they are targeting.<n>We propose a novel defense method that explicitly accounts for the responsive nature of real-world adversaries.
arXiv Detail & Related papers (2025-09-19T18:49:49Z)
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z)
A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Attack and defense techniques in large language models: A survey and new perspectives [5.600972861188751]
Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present security and ethical challenges.<n>This systematic survey explores the evolving landscape of attack and defense techniques in LLMs.
arXiv Detail & Related papers (2025-05-02T03:37:52Z)
A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments [55.60375624503877]
Model Extraction Attacks (MEAs) threaten modern machine learning systems by enabling adversaries to steal models, exposing intellectual property and training data. This survey is motivated by the urgent need to understand how the unique characteristics of cloud, edge, and federated deployments shape attack vectors and defense requirements. We systematically examine the evolution of attack methodologies and defense mechanisms across these environments, demonstrating how environmental factors influence security strategies in critical sectors such as autonomous vehicles, healthcare, and financial services.
arXiv Detail & Related papers (2025-02-22T03:46:50Z)
Safety at Scale: A Comprehensive Survey of Large Model Safety [299.801463557549]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats.<n>We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z)
CALoR: Towards Comprehensive Model Inversion Defense [43.2642796582236]
Model Inversion Attacks (MIAs) aim at recovering privacy-sensitive training data from the knowledge encoded in released machine learning models. Recent advances in the MIA field have significantly enhanced the attack performance under multiple scenarios. We propose a robust defense mechanism, integrating Confidence Adaptation and Low-Rank compression.
arXiv Detail & Related papers (2024-10-08T08:44:01Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [18.624280305864804]
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP) This paper presents a comprehensive survey of the various forms of attacks targeting LLMs. We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.
arXiv Detail & Related papers (2024-03-03T04:46:21Z)
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z)
A Framework for Understanding Model Extraction Attack and Defense [48.421636548746704]
We study tradeoffs between model utility from a benign user's view and privacy from an adversary's view. We develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies.
arXiv Detail & Related papers (2022-06-23T05:24:52Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
Improving Robustness to Model Inversion Attacks via Mutual Information Regularization [12.079281416410227]
This paper studies defense mechanisms against model inversion (MI) attacks. MI is a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model. We propose the Mutual Information Regularization based Defense (MID) against MI attacks.
arXiv Detail & Related papers (2020-09-11T06:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.