Related papers: Exploiting Novel GPT-4 APIs

Exploiting Novel GPT-4 APIs

URL: http://arxiv.org/abs/2312.14302v2
Date: Sun, 4 Aug 2024 17:48:33 GMT
Title: Exploiting Novel GPT-4 APIs
Authors: Kellin Pelrine, Mohammad Taufeeque, Michał Zając, Euan McLean, Adam Gleave,
Abstract summary: We explore three new functionalities exposed in the GPT-4 APIs: fine-tuning, function calling and knowledge retrieval. We find that fine-tuning a model on as few as 15 harmful examples or 100 benign examples can remove core safeguards from GPT-4. We find that GPT-4 Assistants readily divulge the function call schema and can be made to execute arbitrary function calls.
Score: 3.8710514689921296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text generation API. However, real-world APIs are often more flexible than just text generation: these APIs expose "gray-box" access leading to new threat vectors. To explore this, we red-team three new functionalities exposed in the GPT-4 APIs: fine-tuning, function calling and knowledge retrieval. We find that fine-tuning a model on as few as 15 harmful examples or 100 benign examples can remove core safeguards from GPT-4, enabling a range of harmful outputs. Furthermore, we find that GPT-4 Assistants readily divulge the function call schema and can be made to execute arbitrary function calls. Finally, we find that knowledge retrieval can be hijacked by injecting instructions into retrieval documents. These vulnerabilities highlight that any additions to the functionality exposed by an API can create new vulnerabilities.

Related papers

Fundamental Limitations in Defending LLM Finetuning APIs [61.29028411001255]
We show that defences of fine-tuning APIs are fundamentally limited in their ability to prevent fine-tuning attacks. We construct 'pointwise-undetectable' attacks that repurpose entropy in benign model outputs to covertly transmit dangerous knowledge. We test our attacks against the OpenAI fine-tuning API, finding they succeed in eliciting answers to harmful multiple-choice questions.
arXiv Detail & Related papers (2025-02-20T18:45:01Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution. We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
Adaptive Exploit Generation against Security Devices and Security APIs [3.706222947143855]
We show how to automatically derive proof-of-concept exploits against Security APIs using formal methods. We extend the popular protocol verifier ProVerif with a language-agnostic template mechanism.
arXiv Detail & Related papers (2024-10-02T14:05:44Z)
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability. Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z)
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment [49.00213183302225]
We propose a framework to induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4.
arXiv Detail & Related papers (2024-07-10T15:52:44Z)
A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks [9.693391036125908]
We propose a novel unsupervised few-shot anomaly detection framework composed of two main parts. First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach.
arXiv Detail & Related papers (2024-05-18T10:15:31Z)
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 [45.935748395725206]
We introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence.
arXiv Detail & Related papers (2023-12-13T17:39:44Z)
VulLibGen: Generating Names of Vulnerability-Affected Packages via a Large Language Model [13.96251273677855]
VulLibGen is a method to directly generate affected packages. It has an average accuracy of 0.806 for identifying vulnerable packages. We have submitted 60 vulnerability, affected package> pairs to GitHub Advisory.
arXiv Detail & Related papers (2023-08-09T02:02:46Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks [98.15243373574518]
Pre-trained models (PTMs) have been widely used in various downstream tasks. In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks.
arXiv Detail & Related papers (2021-01-18T10:18:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.