Language Model Inversion
- URL: http://arxiv.org/abs/2311.13647v1
- Date: Wed, 22 Nov 2023 19:04:04 GMT
- Title: Language Model Inversion
- Authors: John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov,
Alexander M. Rush
- Abstract summary: We show that next-token probabilities contain a surprising amount of information about the preceding text.
Our inversion method reconstructs prompts with a BLEU of $59$ and token-level F1 of $78$ and recovers $27%$ of prompts exactly.
- Score: 77.22715643068284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models produce a distribution over the next token; can we use this
information to recover the prompt tokens? We consider the problem of language
model inversion and show that next-token probabilities contain a surprising
amount of information about the preceding text. Often we can recover the text
in cases where it is hidden from the user, motivating a method for recovering
unknown prompts given only the model's current distribution output. We consider
a variety of model access scenarios, and show how even without predictions for
every token in the vocabulary we can recover the probability vector through
search. On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of
$59$ and token-level F1 of $78$ and recovers $27\%$ of prompts exactly. Code
for reproducing all experiments is available at
http://github.com/jxmorris12/vec2text.
Related papers
- Language Models Can Predict Their Own Behavior [28.80639362933004]
We show that internal representation of input tokens alone can often precisely predict, not just the next token, but eventual behavior over the entire output sequence.
We leverage this capacity and learn probes on internal states to create early warning (and exit) systems.
Specifically, if the probes can confidently estimate the way the LM is going to behave, then the system will avoid generating tokens altogether and return the estimated behavior instead.
arXiv Detail & Related papers (2025-02-18T23:13:16Z) - From Language Models over Tokens to Language Models over Characters [54.123846188068384]
Modern language models are internally -- and mathematically -- distributions over token strings rather than emphcharacter strings.
This paper presents algorithms for converting token-level language models to character-level ones.
arXiv Detail & Related papers (2024-12-04T21:19:20Z) - Prompt Stability Scoring for Text Annotation with Large Language Models [0.0]
Researchers are increasingly using language models (LMs) for text annotation.
These approaches rely only on a prompt telling the model to return a given output according to a set of instructions.
The diagnose of LM outputs may nonetheless be vulnerable to small changes in the prompt design.
arXiv Detail & Related papers (2024-07-02T08:11:18Z) - Understanding and Mitigating Tokenization Bias in Language Models [6.418593476658017]
State-of-the-art language models are autoregressive and operate on subword units known as tokens.
We show that popular encoding schemes induce a sampling bias that cannot be mitigated with more training or data.
We propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data.
arXiv Detail & Related papers (2024-06-24T17:38:02Z) - Think before you speak: Training Language Models With Pause Tokens [73.61375226378712]
Language models generate responses by producing a series of tokens in immediate succession.
What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)th$ token?
We operationalize this idea by performing training and inference on language models with a (learnable) $textitpause$ token.
arXiv Detail & Related papers (2023-10-03T17:32:41Z) - Robust Distortion-free Watermarks for Language Models [85.55407177530746]
We propose a methodology for planting watermarks in text from an autoregressive language model.
We generate watermarked text by mapping a sequence of random numbers to a sample from the language model.
arXiv Detail & Related papers (2023-07-28T14:52:08Z) - Can discrete information extraction prompts generalize across language
models? [36.85568212975316]
We study whether automatically-induced prompts can also be used, out-of-the-box, to probe other language models for the same information.
We introduce a way to induce prompts by mixing language models at training time that results in prompts that generalize well across models.
arXiv Detail & Related papers (2023-02-20T09:56:51Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - Pre-trained Token-replaced Detection Model as Few-shot Learner [31.40447168356879]
We propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA.
A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models.
arXiv Detail & Related papers (2022-03-07T09:47:53Z) - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning"
Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.