Related papers: Smaller Language Models are Better Black-box Machine-Generated Text Detectors

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

URL: http://arxiv.org/abs/2305.09859v4
Date: Sat, 24 Feb 2024 19:47:14 GMT
Title: Smaller Language Models are Better Black-box Machine-Generated Text Detectors
Authors: Niloofar Mireshghallah, Justus Mattern, Sicun Gao, Reza Shokri, Taylor Berg-Kirkpatrick
Abstract summary: Small and partially-trained models are better universal text detectors. We find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
Score: 56.36291277897995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have been a slew of methods proposed to detect machine-generated text. Most of these methods need access to the logits of the target model or need the ability to sample from the target. One such black-box detection method relies on the observation that generated text is locally optimal under the likelihood function of the generator, while human-written text is not. We find that overall, smaller and partially-trained models are better universal text detectors: they can more precisely detect text generated from both small and larger models. Interestingly, we find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.

Related papers

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions. We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process. We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection [0.0]
We study the problem of detecting machine-generated text when the large language model it is possibly derived from is unknown. We use a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) language model is the same as the discriminative (or scoring) language model.
arXiv Detail & Related papers (2024-06-18T12:58:01Z)
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple but effective black-box zero-shot detection approach. It is predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. Our method achieves an average AUROC of 98.7% and shows strong robustness against paraphrase and adversarial perturbation attacks.
arXiv Detail & Related papers (2024-05-07T12:57:01Z)
Few-Shot Detection of Machine-Generated Text using Style Representations [4.326503887981912]
Language models that convincingly mimic human writing pose a significant risk of abuse. We propose to leverage representations of writing style estimated from human-authored text. We find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors.
arXiv Detail & Related papers (2024-01-12T17:26:51Z)
Multiscale Positive-Unlabeled Detection of AI-Generated Texts [27.956604193427772]
Multiscale Positive-Unlabeled (MPU) training framework is proposed to address the difficulty of short-text detection. MPU method augments detection performance on long AI-generated texts, and significantly improves short-text detection of language model detectors.
arXiv Detail & Related papers (2023-05-29T15:25:00Z)
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z)
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking. We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z)
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function. We then define a new curvature-based criterion for judging if a passage is generated from a given LLM. We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z)
Unsupervised and Distributional Detection of Machine-Generated Text [1.552214657968262]
The power of natural language generation models has provoked a flurry of interest in automatic methods to detect if a piece of text is human or machine-authored. We propose a method to detect those machine-generated documents leveraging repeated higher-order n-grams. Our experiments show that leveraging that signal allows us to rank suspicious documents accurately.
arXiv Detail & Related papers (2021-11-04T14:07:46Z)
Learning Sparse Prototypes for Text Generation [120.38555855991562]
Prototype-driven text generation is inefficient at test time as a result of needing to store and index the entire training corpus. We propose a novel generative model that automatically learns a sparse prototype support set that achieves strong language modeling performance. In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction.
arXiv Detail & Related papers (2020-06-29T19:41:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.