Related papers: AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising

AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising

URL: http://arxiv.org/abs/2311.07700v1
Date: Mon, 13 Nov 2023 19:36:54 GMT
Title: AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising
Authors: Zhen Guo, Shangdi Yu
Abstract summary: Large language models (LLMs) create text that closely mimics human writing, which can lead to potential misuse. We present AuthentiGPT, an efficient classifier that distinguishes between machine-generated and human-written texts. With a 0.918 AUROC score on a domain-specific dataset, AuthentiGPT demonstrates its effectiveness over other commercial algorithms.
Score: 4.924903495092775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have opened up enormous opportunities while simultaneously posing ethical dilemmas. One of the major concerns is their ability to create text that closely mimics human writing, which can lead to potential misuse, such as academic misconduct, disinformation, and fraud. To address this problem, we present AuthentiGPT, an efficient classifier that distinguishes between machine-generated and human-written texts. Under the assumption that human-written text resides outside the distribution of machine-generated text, AuthentiGPT leverages a black-box LLM to denoise input text with artificially added noise, and then semantically compares the denoised text with the original to determine if the content is machine-generated. With only one trainable parameter, AuthentiGPT eliminates the need for a large training dataset, watermarking the LLM's output, or computing the log-likelihood. Importantly, the detection capability of AuthentiGPT can be easily adapted to any generative language model. With a 0.918 AUROC score on a domain-specific dataset, AuthentiGPT demonstrates its effectiveness over other commercial algorithms, highlighting its potential for detecting machine-generated text in academic settings.

Related papers

AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models [0.2796197251957245]
Large Language Models (LLMs) produce text that is both grammatically correct and semantically meaningful.<n>LLMs have been misused to create highly realistic phishing emails, spread fake news, generate code to automate cyber crime, and write fraudulent scientific articles.<n>Various attempts have been made to distinguish machine-generated text from human-authored content using linguistic, statistical, machine learning, and ensemble-based approaches.
arXiv Detail & Related papers (2025-07-07T16:13:13Z)
Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z)
AuthorMist: Evading AI Text Detectors with Reinforcement Learning [4.806579822134391]
AuthorMist is a novel reinforcement learning-based system to transform AI-generated text into human-like writing. We show that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning.
arXiv Detail & Related papers (2025-03-10T12:41:05Z)
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions. We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process. We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text [1.1137087573421256]
This study aims to support efforts to detect and identify textual content generated using Generative AI Large Language Models. We leverage several machine learning algorithms such as Random Forest (RF), and Recurrent Neural Networks (RNN) to understand the important features in attribution. Our method is divided into 1) binary classification to differentiate between human-written and AI-text, and 2) multi classification, to differentiate between human-written text and the text generated by the five different LLM tools.
arXiv Detail & Related papers (2025-01-06T18:46:53Z)
Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated. We present LLM-DetectAIve, designed for fine-grained detection. It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z)
IDT: Dual-Task Adversarial Attacks for Privacy Protection [8.312362092693377]
Methods to protect privacy can involve using representations inside models that are not to detect sensitive attributes. We propose IDT, a method that analyses predictions made by auxiliary and interpretable models to identify which tokens are important to change. We evaluate different datasets for NLP suitable for different tasks.
arXiv Detail & Related papers (2024-06-28T04:14:35Z)
GPT-who: An Information Density-based Machine-Generated Text Detector [6.111161457447324]
We propose GPT-who, the first psycholinguistically-inspired domain-agnostic statistical detector. This detector employs UID-based features to model the unique statistical signature of each Large Language Models (LLMs)-generated and human-generated texts. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible.
arXiv Detail & Related papers (2023-10-09T23:06:05Z)
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z)
Smaller Language Models are Better Black-box Machine-Generated Text Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors. We find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z)
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking. We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z)
Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques. In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
Towards Computationally Verifiable Semantic Grounding for Language Models [18.887697890538455]
The paper conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds the LM in an auto-encoder by feeding its output to a semantic fluency whose output is in the same representation domain as the input message. We show that our proposed approaches significantly improve on the greedy search baseline.
arXiv Detail & Related papers (2022-11-16T17:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.