Related papers: Stealing the Decoding Algorithms of Language Models

Stealing the Decoding Algorithms of Language Models

URL: http://arxiv.org/abs/2303.04729v4
Date: Fri, 1 Dec 2023 22:34:34 GMT
Title: Stealing the Decoding Algorithms of Language Models
Authors: Ali Naseh, Kalpesh Krishna, Mohit Iyyer, Amir Houmansadr
Abstract summary: A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyper parameters of its decoding algorithms. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo.
Score: 56.369946232765656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human evaluation. Therefore, the identity and hyperparameters of such decoding algorithms are considered to be extremely valuable to their owners. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyperparameters of its decoding algorithms at very low monetary costs. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo. We demonstrate the feasibility of stealing such information with only a few dollars, e.g., $\$0.8$, $\$1$, $\$4$, and $\$40$ for the four versions of GPT-3.

Related papers

Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis [0.40964539027092917]
We present the first systematic study of Large Language Models' ability to generate efficient C implementations of graph-analysis routines.<n>Eight state-of-the-art models are benchmarked.<n>Results show that Claude Sonnet 4 Extended achieves the best result in the case of ready-to-use code generation and efficiency.
arXiv Detail & Related papers (2025-07-09T00:46:30Z)
Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z)
LLM-Text Watermarking based on Lagrange Interpolation [1.3904534961196113]
This work presents a watermarking scheme for LLM-generated text based on Lagrange adversarial.<n>The core idea is to embed a continuous sequence of points $(x, f(x))$ that lie on a single straight line.<n>During extraction, the algorithm recovers the original points along with many spurious ones.
arXiv Detail & Related papers (2025-05-09T01:19:01Z)
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG [57.14250086701313]
We investigate the extent to which modern LMs generate $n$-grams from their training data. We develop Rusty-DAWG, a novel search tool inspired by indexing of genomic data.
arXiv Detail & Related papers (2024-06-18T21:31:19Z)
The CLRS-Text Algorithmic Reasoning Language Benchmark [48.45201665463275]
CLRS-Text is a textual version of the CLRS benchmark. CLRS-Text is capable of procedurally generating trace data for thirty diverse, challenging algorithmic tasks. We fine-tune and evaluate various LMs as generalist executors on this benchmark.
arXiv Detail & Related papers (2024-06-06T16:29:25Z)
GPT-who: An Information Density-based Machine-Generated Text Detector [6.111161457447324]
We propose GPT-who, the first psycholinguistically-inspired domain-agnostic statistical detector. This detector employs UID-based features to model the unique statistical signature of each Large Language Models (LLMs)-generated and human-generated texts. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible.
arXiv Detail & Related papers (2023-10-09T23:06:05Z)
Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text. Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z)
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism. Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs. We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z)
Memorization for Good: Encryption with Autoregressive Language Models [8.645826579841692]
We propose the first symmetric encryption algorithm with autoregressive language models (SELM) We show that autoregressive LMs can encode arbitrary data into a compact real-valued vector (i.e., encryption) and then losslessly decode the vector to the original message (i.e. decryption) via random subspace optimization and greedy decoding.
arXiv Detail & Related papers (2023-05-15T05:42:34Z)
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking. We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z)
Contrastive Decoding: Open-ended Text Generation as Optimization [153.35961722855686]
We propose contrastive decoding (CD), a reliable decoding approach. It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
arXiv Detail & Related papers (2022-10-27T00:58:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.