Related papers: Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

URL: http://arxiv.org/abs/2307.01292v2
Date: Sun, 6 Aug 2023 19:17:07 GMT
Title: Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Authors: Debopam Sanyal (Georgia Institute of Technology), Jui-Tse Hung (Georgia Institute of Technology), Manav Agrawal (Georgia Institute of Technology), Prahlad Jasti (Georgia Institute of Technology), Shahab Nikkhoo (University of California, Riverside), Somesh Jha (University of Wisconsin-Madison), Tianhao Wang (University of Virginia), Sibin Mohan (George Washington University), Alexey Tumanov (Georgia Institute of Technology)
Abstract summary: Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. We propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.

Related papers

Verifying LLM Inference to Prevent Model Weight Exfiltration [1.4698862238090828]
An attacker controlling an inference server may exfiltrate model weights by hiding them within ordinary model outputs.<n>This work investigates how to verify model responses to defend against such attacks and to detect anomalous or buggy behavior during inference.<n>We formalize model exfiltration as a security game, propose a verification framework that can provably mitigate steganographic exfiltration.
arXiv Detail & Related papers (2025-11-04T14:51:44Z)
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage [71.8564105095189]
We introduce N-Gram Coverage Attack, a membership inference attack that relies solely on text outputs from the target model.<n>We first demonstrate on a diverse set of existing benchmarks that N-Gram Coverage Attack outperforms other black-box methods.<n>We find that more recent models, such as GPT-4o, exhibit increased robustness to membership inference.
arXiv Detail & Related papers (2025-08-13T08:35:16Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
HoneypotNet: Backdoor Attacks Against Model Extraction [24.603590328055027]
Model extraction attacks pose severe security threats to production models and ML platforms. We introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous. HoneypotNet can inject backdoors into substitute models with a high success rate.
arXiv Detail & Related papers (2025-01-02T06:23:51Z)
ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs [17.853862145962292]
We introduce a novel backdoor attack that systematically bypasses system prompts. Our method achieves an attack success rate (ASR) of up to 99.50% while maintaining a clean accuracy (CACC) of 98.58%.
arXiv Detail & Related papers (2024-10-05T02:58:20Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
Careful What You Wish For: on the Extraction of Adversarially Trained Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats. We propose a framework to assess extraction attacks on adversarially trained models. We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z)
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z)
Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations [22.89321897726347]
We propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model. Our framework can detect model IP breaches with confidence 99.99 %$ within only $20$ fingerprints of the suspect model.
arXiv Detail & Related papers (2022-02-17T11:29:50Z)
Increasing the Cost of Model Extraction with Calibrated Proof of Work [25.096196576476885]
In model extraction attacks, adversaries can steal a machine learning model exposed via a public API. We propose requiring users to complete a proof-of-work before they can read the model's predictions.
arXiv Detail & Related papers (2022-01-23T12:21:28Z)
Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback. We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)
Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective. Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z)
Probing Model Signal-Awareness via Prediction-Preserving Input Minimization [67.62847721118142]
We evaluate models' ability to capture the correct vulnerability signals to produce their predictions. We measure the signal awareness of models using a new metric we propose- Signal-aware Recall (SAR) The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric.
arXiv Detail & Related papers (2020-11-25T20:05:23Z)
Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks. Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.