Related papers: Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

URL: http://arxiv.org/abs/2501.02406v3
Date: Sat, 12 Apr 2025 18:05:42 GMT
Title: Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities
Authors: Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari,
Abstract summary: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc.<n>This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content.<n>We show that our tests' type I and type II errors decrease exponentially as text length increases.<n> Practically, our work enables guaranteed finding of the origin of harmful or false LLM-generated text, which can be useful for combating misinformation and compliance with emerging AI regulations.
Score: 13.657259851747126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions utilize in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within the institution. We answer the following question: Given a piece of text, can we identify whether it was produced by LLM $A$ or $B$ (where $B$ can be a human)? We model LLM-generated text as a sequential stochastic process with complete dependence on history and design zero-shot statistical tests to distinguish between (i) the text generated by two different sets of LLMs $A$ (in-house) and $B$ (non-sanctioned) and also (ii) LLM-generated and human-generated texts. We prove that our tests' type I and type II errors decrease exponentially as text length increases. For designing our tests for a given string, we demonstrate that if the string is generated by the evaluator model $A$, the log-perplexity of the string under $A$ converges to the average entropy of the string under $A$, except with an exponentially small probability in the string length. We also show that if $B$ generates the text, except with an exponentially small probability in string length, the log-perplexity of the string under $A$ converges to the average cross-entropy of $B$ and $A$. For our experiments: First, we present experiments using open-source LLMs to support our theoretical results, and then we provide experiments in a black-box setting with adversarial attacks. Practically, our work enables guaranteed finding of the origin of harmful or false LLM-generated text, which can be useful for combating misinformation and compliance with emerging AI regulations.

Related papers

Does a Large Language Model Really Speak in Human-Like Language? [0.5735035463793009]
Large Language Models (LLMs) have recently emerged, attracting considerable attention due to their ability to generate highly natural, human-like text.<n>This study compares the latent community structures of LLM-generated text and human-written text.<n>Our results indicate that GPT-generated text remains distinct from human-authored text.
arXiv Detail & Related papers (2025-01-02T14:13:44Z)
Reasoning Robustness of LLMs to Adversarial Typographical Errors [49.99118660264703]
Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning using Chain-of-Thought (CoT) prompting. We study the reasoning robustness of LLMs to typographical errors, which can naturally occur in users' queries. We design an Adversarial Typo Attack ($texttATA$) algorithm that iteratively samples typos for words that are important to the query and selects the edit that is most likely to succeed in attacking.
arXiv Detail & Related papers (2024-11-08T05:54:05Z)
Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection [43.66875548677324]
We train AI-generated (AIG) text classifiers using the LibAUC library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts.
arXiv Detail & Related papers (2024-10-18T21:42:37Z)
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG [57.14250086701313]
We investigate the extent to which modern LMs generate $n$-grams from their training data. We develop Rusty-DAWG, a novel search tool inspired by indexing of genomic data.
arXiv Detail & Related papers (2024-06-18T21:31:19Z)
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple yet effective black-box zero-shot detection approach based on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods.
arXiv Detail & Related papers (2024-05-07T12:57:01Z)
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code [11.194047962236793]
We present a method for evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence. Turbulence consists of a large set of natural language $textitquestion templates$, each of which is a programming problem, parameterised so that it can be asked in many different forms. From a single question template, it is possible to ask an LLM a $textitneighbourhood$ of very similar programming questions, and assess the correctness of the result returned for each question.
arXiv Detail & Related papers (2023-12-22T17:29:08Z)
Do large language models and humans have similar behaviors in causal inference with script knowledge? [13.140513796801915]
We study the processing of an event $B$ in a script-based story. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text.
arXiv Detail & Related papers (2023-11-13T13:05:15Z)
SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs) We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z)
LLMDet: A Third Party Large Language Models Generated Text Detection Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text. Existing detection tools can only differentiate between machine-generated and human-authored text. We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z)
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study [44.39031420687302]
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. We try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs. We propose $textitself-augmentation$ for effective structural prompting, such as critical value / range identification.
arXiv Detail & Related papers (2023-05-22T14:23:46Z)
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.