Smaller Language Models are Better Black-box Machine-Generated Text
Detectors
- URL: http://arxiv.org/abs/2305.09859v4
- Date: Sat, 24 Feb 2024 19:47:14 GMT
- Title: Smaller Language Models are Better Black-box Machine-Generated Text
Detectors
- Authors: Niloofar Mireshghallah, Justus Mattern, Sicun Gao, Reza Shokri, Taylor
Berg-Kirkpatrick
- Abstract summary: Small and partially-trained models are better universal text detectors.
We find that whether the detector and generator were trained on the same data is not critically important to the detection success.
For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
- Score: 56.36291277897995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advent of fluent generative language models that can produce
convincing utterances very similar to those written by humans, distinguishing
whether a piece of text is machine-generated or human-written becomes more
challenging and more important, as such models could be used to spread
misinformation, fake news, fake reviews and to mimic certain authors and
figures. To this end, there have been a slew of methods proposed to detect
machine-generated text. Most of these methods need access to the logits of the
target model or need the ability to sample from the target. One such black-box
detection method relies on the observation that generated text is locally
optimal under the likelihood function of the generator, while human-written
text is not. We find that overall, smaller and partially-trained models are
better universal text detectors: they can more precisely detect text generated
from both small and larger models. Interestingly, we find that whether the
detector and generator were trained on the same data is not critically
important to the detection success. For instance the OPT-125M model has an AUC
of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT
family, GPTJ-6B, has AUC of 0.45.
Related papers
- Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection [0.0]
We study the problem of detecting machine-generated text when the large language model it is possibly derived from is unknown.
We use a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) language model is the same as the discriminative (or scoring) language model.
arXiv Detail & Related papers (2024-06-18T12:58:01Z) - Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple but effective black-box zero-shot detection approach.
It is predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.
Our method achieves an average AUROC of 98.7% and shows strong robustness against paraphrase and adversarial perturbation attacks.
arXiv Detail & Related papers (2024-05-07T12:57:01Z) - Few-Shot Detection of Machine-Generated Text using Style Representations [4.326503887981912]
Language models that convincingly mimic human writing pose a significant risk of abuse.
We propose to leverage representations of writing style estimated from human-authored text.
We find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors.
arXiv Detail & Related papers (2024-01-12T17:26:51Z) - Multiscale Positive-Unlabeled Detection of AI-Generated Texts [27.956604193427772]
Multiscale Positive-Unlabeled (MPU) training framework is proposed to address the difficulty of short-text detection.
MPU method augments detection performance on long AI-generated texts, and significantly improves short-text detection of language model detectors.
arXiv Detail & Related papers (2023-05-29T15:25:00Z) - DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets.
Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics.
We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability
Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function.
We then define a new curvature-based criterion for judging if a passage is generated from a given LLM.
We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z) - Unsupervised and Distributional Detection of Machine-Generated Text [1.552214657968262]
The power of natural language generation models has provoked a flurry of interest in automatic methods to detect if a piece of text is human or machine-authored.
We propose a method to detect those machine-generated documents leveraging repeated higher-order n-grams.
Our experiments show that leveraging that signal allows us to rank suspicious documents accurately.
arXiv Detail & Related papers (2021-11-04T14:07:46Z) - Learning Sparse Prototypes for Text Generation [120.38555855991562]
Prototype-driven text generation is inefficient at test time as a result of needing to store and index the entire training corpus.
We propose a novel generative model that automatically learns a sparse prototype support set that achieves strong language modeling performance.
In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction.
arXiv Detail & Related papers (2020-06-29T19:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.