How Reliable Are AI-Generated-Text Detectors? An Assessment Framework
Using Evasive Soft Prompts
- URL: http://arxiv.org/abs/2310.05095v1
- Date: Sun, 8 Oct 2023 09:53:46 GMT
- Title: How Reliable Are AI-Generated-Text Detectors? An Assessment Framework
Using Evasive Soft Prompts
- Authors: Tharindu Kumarage, Paras Sheth, Raha Moraffah, Joshua Garland, Huan
Liu
- Abstract summary: We propose a novel approach that can prompt any PLM to generate text that evades high-performing detectors.
The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors.
We conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.
- Score: 14.175243473740727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, there has been a rapid proliferation of AI-generated text,
primarily driven by the release of powerful pre-trained language models (PLMs).
To address the issue of misuse associated with AI-generated text, various
high-performing detectors have been developed, including the OpenAI detector
and the Stanford DetectGPT. In our study, we ask how reliable these detectors
are. We answer the question by designing a novel approach that can prompt any
PLM to generate text that evades these high-performing detectors. The proposed
approach suggests a universal evasive prompt, a novel type of soft prompt,
which guides PLMs in producing "human-like" text that can mislead the
detectors. The novel universal evasive prompt is achieved in two steps: First,
we create an evasive soft prompt tailored to a specific PLM through prompt
tuning; and then, we leverage the transferability of soft prompts to transfer
the learned evasive soft prompt from one PLM to another. Employing multiple
PLMs in various writing tasks, we conduct extensive experiments to evaluate the
efficacy of the evasive soft prompts in their evasion of state-of-the-art
detectors.
Related papers
- DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection [23.794925542322098]
We analyze the impact of prompt-specific shortcuts in AIGT detection.
We propose Feedback-based Adversarial Instruction List Optimization (FAILOpt)
FAILOpt effectively drops the detection performance of the target detector, comparable to other attacks based on adversarial in-context examples.
arXiv Detail & Related papers (2024-06-24T02:50:09Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with
Adversarially Generated Examples [44.118047780553006]
OUTFOX is a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output.
Experiments show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41.3 points F1-score.
The detector shows a state-of-the-art detection performance: up to 96.9 points F1-score, beating existing detectors on non-attacked texts.
arXiv Detail & Related papers (2023-07-21T17:40:47Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Large Language Models can be Guided to Evade AI-Generated Text Detection [40.7707919628752]
Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public.
We equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors.
We propose a novel Substitution-based In-Context example optimization method (SICO) to automatically construct prompts for evading the detectors.
arXiv Detail & Related papers (2023-05-18T10:03:25Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.