Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing
Inference Serving Systems
- URL: http://arxiv.org/abs/2307.01292v2
- Date: Sun, 6 Aug 2023 19:17:07 GMT
- Title: Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing
Inference Serving Systems
- Authors: Debopam Sanyal (Georgia Institute of Technology), Jui-Tse Hung
(Georgia Institute of Technology), Manav Agrawal (Georgia Institute of
Technology), Prahlad Jasti (Georgia Institute of Technology), Shahab Nikkhoo
(University of California, Riverside), Somesh Jha (University of
Wisconsin-Madison), Tianhao Wang (University of Virginia), Sibin Mohan
(George Washington University), Alexey Tumanov (Georgia Institute of
Technology)
- Abstract summary: Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests.
We propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently.
We counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-serving systems have become increasingly popular, especially in
real-time web applications. In such systems, users send queries to the server
and specify the desired performance metrics (e.g., desired accuracy, latency).
The server maintains a set of models (model zoo) in the back-end and serves the
queries based on the specified metrics. This paper examines the security,
specifically robustness against model extraction attacks, of such systems.
Existing black-box attacks assume a single model can be repeatedly selected for
serving inference requests. Modern inference serving systems break this
assumption. Thus, they cannot be directly applied to extract a victim model, as
models are hidden behind a layer of abstraction exposed by the serving system.
An attacker can no longer identify which model she is interacting with. To this
end, we first propose a query-efficient fingerprinting algorithm to enable the
attacker to trigger any desired model consistently. We show that by using our
fingerprinting algorithm, model extraction can have fidelity and accuracy
scores within $1\%$ of the scores obtained when attacking a single, explicitly
specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$
gain in fidelity compared to the naive attack. Second, we counter the proposed
attack with a noise-based defense mechanism that thwarts fingerprinting by
adding noise to the specified performance metrics. The proposed defense
strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and
$4.8\%$, respectively (on medium-sized model extraction). Third, we show that
the proposed defense induces a fundamental trade-off between the level of
protection and system goodput, achieving configurable and significant victim
model extraction protection while maintaining acceptable goodput ($>80\%$). We
implement the proposed defense in a real system with plans to open source.
Related papers
- Verifying LLM Inference to Prevent Model Weight Exfiltration [1.4698862238090828]
An attacker controlling an inference server may exfiltrate model weights by hiding them within ordinary model outputs.<n>This work investigates how to verify model responses to defend against such attacks and to detect anomalous or buggy behavior during inference.<n>We formalize model exfiltration as a security game, propose a verification framework that can provably mitigate steganographic exfiltration.
arXiv Detail & Related papers (2025-11-04T14:51:44Z) - The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage [71.8564105095189]
We introduce N-Gram Coverage Attack, a membership inference attack that relies solely on text outputs from the target model.<n>We first demonstrate on a diverse set of existing benchmarks that N-Gram Coverage Attack outperforms other black-box methods.<n>We find that more recent models, such as GPT-4o, exhibit increased robustness to membership inference.
arXiv Detail & Related papers (2025-08-13T08:35:16Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - HoneypotNet: Backdoor Attacks Against Model Extraction [24.603590328055027]
Model extraction attacks pose severe security threats to production models and ML platforms.
We introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous.
HoneypotNet can inject backdoors into substitute models with a high success rate.
arXiv Detail & Related papers (2025-01-02T06:23:51Z) - ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs [17.853862145962292]
We introduce a novel backdoor attack that systematically bypasses system prompts.
Our method achieves an attack success rate (ASR) of up to 99.50% while maintaining a clean accuracy (CACC) of 98.58%.
arXiv Detail & Related papers (2024-10-05T02:58:20Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Fingerprinting Deep Neural Networks Globally via Universal Adversarial
Perturbations [22.89321897726347]
We propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model.
Our framework can detect model IP breaches with confidence 99.99 %$ within only $20$ fingerprints of the suspect model.
arXiv Detail & Related papers (2022-02-17T11:29:50Z) - Increasing the Cost of Model Extraction with Calibrated Proof of Work [25.096196576476885]
In model extraction attacks, adversaries can steal a machine learning model exposed via a public API.
We propose requiring users to complete a proof-of-work before they can read the model's predictions.
arXiv Detail & Related papers (2022-01-23T12:21:28Z) - Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary.
In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback.
We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - Probing Model Signal-Awareness via Prediction-Preserving Input
Minimization [67.62847721118142]
We evaluate models' ability to capture the correct vulnerability signals to produce their predictions.
We measure the signal awareness of models using a new metric we propose- Signal-aware Recall (SAR)
The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric.
arXiv Detail & Related papers (2020-11-25T20:05:23Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.