Automated Detection of Pre-training Text in Black-box LLMs
- URL: http://arxiv.org/abs/2506.19399v1
- Date: Tue, 24 Jun 2025 08:08:15 GMT
- Title: Automated Detection of Pre-training Text in Black-box LLMs
- Authors: Ruihan Hu, Yu-Ming Shang, Jiankun Peng, Wei Luo, Yazhe Wang, Xi Zhang,
- Abstract summary: VeilProbe is a framework for automatically detecting pre-training texts in a black-box setting without human intervention.<n>It infers the latent mapping feature between the input text and the corresponding output suffix generated by the Large Language Models.<n>It performs the key token perturbations to obtain more distinguishable membership features.
- Score: 11.227481657336385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting whether a given text is a member of the pre-training data of Large Language Models (LLMs) is crucial for ensuring data privacy and copyright protection. Most existing methods rely on the LLM's hidden information (e.g., model parameters or token probabilities), making them ineffective in the black-box setting, where only input and output texts are accessible. Although some methods have been proposed for the black-box setting, they rely on massive manual efforts such as designing complicated questions or instructions. To address these issues, we propose VeilProbe, the first framework for automatically detecting LLMs' pre-training texts in a black-box setting without human intervention. VeilProbe utilizes a sequence-to-sequence mapping model to infer the latent mapping feature between the input text and the corresponding output suffix generated by the LLM. Then it performs the key token perturbations to obtain more distinguishable membership features. Additionally, considering real-world scenarios where the ground-truth training text samples are limited, a prototype-based membership classifier is introduced to alleviate the overfitting issue. Extensive evaluations on three widely used datasets demonstrate that our framework is effective and superior in the black-box setting.
Related papers
- Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models [3.8688081072587326]
Causal2Vec is a general-purpose embedding model tailored to enhance the performance of decoder-only large language models.<n>We first employ a lightweight BERT-style model to pre-encode the input text into a single Contextual token.<n>To mitigate the recency bias by last-token pooling, we introduced the last hidden states of Contextual and EOS tokens as the final text embedding.
arXiv Detail & Related papers (2025-07-31T10:01:11Z) - Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - Large Language Models as Carriers of Hidden Messages [0.0]
Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query.<n>Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction.<n>We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates.
arXiv Detail & Related papers (2024-06-04T16:49:06Z) - Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple yet effective black-box zero-shot detection approach based on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.<n> Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods.
arXiv Detail & Related papers (2024-05-07T12:57:01Z) - Generative Text Steganography with Large Language Model [10.572149957139736]
Black-box generative text steganographic method based on user interfaces of large language models, which is called LLM-Stega.
We first construct a keyword set and design a new encrypted steganographic mapping to embed secret messages.
Comprehensive experiments demonstrate that the proposed LLM-Stega outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-16T02:19:28Z) - TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - Prompt Highlighter: Interactive Control for Multi-Modal LLMs [50.830448437285355]
This study targets a critical aspect of multi-modal LLMs' (LLMs&VLMs) inference: explicit controllable text generation.
We introduce a novel inference method, Prompt Highlighter, which enables users to highlight specific prompt spans to interactively control the focus during generation.
We find that, during inference, guiding the models with highlighted tokens through the attention weights leads to more desired outputs.
arXiv Detail & Related papers (2023-12-07T13:53:29Z) - Token Prediction as Implicit Classification to Identify LLM-Generated
Text [37.89852204279844]
This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.
Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task.
We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments.
arXiv Detail & Related papers (2023-11-15T06:33:52Z) - SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs)
We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z) - Language Models as Black-Box Optimizers for Vision-Language Models [62.80817942316398]
Vision-language models (VLMs) pre-trained on web-scale datasets have demonstrated remarkable capabilities on downstream tasks when fine-tuned with minimal data.
We aim to develop a black-box approach to optimize VLMs through natural language prompts.
arXiv Detail & Related papers (2023-09-12T04:03:41Z) - DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets.
Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics.
We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.