Mondrian: Prompt Abstraction Attack Against Large Language Models for
Cheaper API Pricing
- URL: http://arxiv.org/abs/2308.03558v1
- Date: Mon, 7 Aug 2023 13:10:35 GMT
- Title: Mondrian: Prompt Abstraction Attack Against Large Language Models for
Cheaper API Pricing
- Authors: Wai Man Si, Michael Backes, Yang Zhang
- Abstract summary: We propose Mondrian, a simple and straightforward method that abstracts sentences, which can lower the cost of using LLM APIs.
Our results show that Mondrian successfully reduces user queries' token length ranging from 13% to 23% across various tasks.
As a result, the prompt abstraction attack enables the adversary to profit without bearing the cost of API development and deployment.
- Score: 19.76564349397695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Machine Learning as a Service (MLaaS) market is rapidly expanding and
becoming more mature. For example, OpenAI's ChatGPT is an advanced large
language model (LLM) that generates responses for various queries with
associated fees. Although these models can deliver satisfactory performance,
they are far from perfect. Researchers have long studied the vulnerabilities
and limitations of LLMs, such as adversarial attacks and model toxicity.
Inevitably, commercial ML models are also not exempt from such issues, which
can be problematic as MLaaS continues to grow. In this paper, we discover a new
attack strategy against LLM APIs, namely the prompt abstraction attack.
Specifically, we propose Mondrian, a simple and straightforward method that
abstracts sentences, which can lower the cost of using LLM APIs. In this
approach, the adversary first creates a pseudo API (with a lower established
price) to serve as the proxy of the target API (with a higher established
price). Next, the pseudo API leverages Mondrian to modify the user query,
obtain the abstracted response from the target API, and forward it back to the
end user. Our results show that Mondrian successfully reduces user queries'
token length ranging from 13% to 23% across various tasks, including text
classification, generation, and question answering. Meanwhile, these abstracted
queries do not significantly affect the utility of task-specific and general
language models like ChatGPT. Mondrian also reduces instruction prompts' token
length by at least 11% without compromising output quality. As a result, the
prompt abstraction attack enables the adversary to profit without bearing the
cost of API development and deployment.
Related papers
- Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.
Traditional fuzzing struggles with the complexity and API diversity of DL libraries.
We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - An Engorgio Prompt Makes Large Language Model Babble on [25.148096060828397]
Auto-regressive large language models (LLMs) have yielded impressive performance in many real-world tasks.
In this paper, we explore their vulnerability to inference cost attacks, where a malicious user crafts Engorgio prompts.
We design Engorgio, a novel methodology, to efficiently generate adversarial Engorgio prompts to affect the target LLM's service availability.
arXiv Detail & Related papers (2024-12-27T01:00:23Z) - ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.
We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - Denial-of-Service Poisoning Attacks against Large Language Models [64.77355353440691]
LLMs are vulnerable to denial-of-service (DoS) attacks, where spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token.
We propose poisoning-based DoS attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit.
arXiv Detail & Related papers (2024-10-14T17:39:31Z) - Aligning LLMs to Be Robust Against Prompt Injection [55.07562650579068]
We show that alignment can be a powerful tool to make LLMs more robust against prompt injection attacks.
Our method -- SecAlign -- first builds an alignment dataset by simulating prompt injection attacks.
Our experiments show that SecAlign robustifies the LLM substantially with a negligible hurt on model utility.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents [7.166156709980112]
We introduce textscShortcutsBench, a large-scale benchmark for the comprehensive evaluation of API-based agents.
textscShortcutsBench includes a wealth of real APIs from Apple Inc.'s operating systems.
Our evaluation reveals significant limitations in handling complex queries related to API selection, parameter filling, and requesting necessary information from systems and users.
arXiv Detail & Related papers (2024-06-28T08:45:02Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - A Solution-based LLM API-using Methodology for Academic Information Seeking [49.096714812902576]
SoAy is a solution-based LLM API-using methodology for academic information seeking.
It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence.
Results show a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines.
arXiv Detail & Related papers (2024-05-24T02:44:14Z) - LLM+Reasoning+Planning for supporting incomplete user queries in presence of APIs [0.09374652839580183]
In practice, natural language task requests (user queries) are often incomplete, i.e., they may not contain all the information required by the APIs.
We leverage logical reasoning and classical AI planning along with an LLM for accurately answering user queries.
Our approach achieves over 95% success rate in most cases on a dataset containing complete and incomplete single goal and multi-goal queries.
arXiv Detail & Related papers (2024-05-21T01:16:34Z) - Logits of API-Protected LLMs Leak Proprietary Information [46.014638838911566]
Large language model (LLM) providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API.
We show that it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries.
arXiv Detail & Related papers (2024-03-14T16:27:49Z) - Make Them Spill the Beans! Coercive Knowledge Extraction from
(Production) LLMs [31.80386572346993]
We exploit the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits.
This approach differs from and outperforms jail-breaking methods, achieving 92% effectiveness compared to 62%, and is 10 to 20 times faster.
Our findings indicate that interrogation can extract toxic knowledge even from models specifically designed for coding tasks.
arXiv Detail & Related papers (2023-12-08T01:41:36Z) - Universal and Transferable Adversarial Attacks on Aligned Language
Models [118.41733208825278]
We propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors.
Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable.
arXiv Detail & Related papers (2023-07-27T17:49:12Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.