RADAR: Robust AI-Text Detection via Adversarial Learning
- URL: http://arxiv.org/abs/2307.03838v2
- Date: Tue, 24 Oct 2023 16:31:49 GMT
- Title: RADAR: Robust AI-Text Detection via Adversarial Learning
- Authors: Xiaomeng Hu and Pin-Yu Chen and Tsung-Yi Ho
- Abstract summary: RADAR is based on adversarial training of a paraphraser and a detector.
The paraphraser's goal is to generate realistic content to evade AI-text detection.
RADAR uses the feedback from the detector to update the paraphraser, and vice versa.
- Score: 69.5883095262619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) and the intensifying
popularity of ChatGPT-like applications have blurred the boundary of
high-quality text generation between humans and machines. However, in addition
to the anticipated revolutionary changes to our technology and society, the
difficulty of distinguishing LLM-generated texts (AI-text) from human-generated
texts poses new challenges of misuse and fairness, such as fake content
generation, plagiarism, and false accusations of innocent writers. While
existing works show that current AI-text detectors are not robust to LLM-based
paraphrasing, this paper aims to bridge this gap by proposing a new framework
called RADAR, which jointly trains a robust AI-text detector via adversarial
learning. RADAR is based on adversarial training of a paraphraser and a
detector. The paraphraser's goal is to generate realistic content to evade
AI-text detection. RADAR uses the feedback from the detector to update the
paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly
2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets,
experimental results show that RADAR significantly outperforms existing AI-text
detection methods, especially when paraphrasing is in place. We also identify
the strong transferability of RADAR from instruction-tuned LLMs to other LLMs,
and evaluate the improved capability of RADAR via GPT-3.5-Turbo.
Related papers
- ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination [1.8418334324753884]
This paper introduces back-translation as a novel technique for evading detection.
We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text.
We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems.
arXiv Detail & Related papers (2024-09-22T01:13:22Z) - Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated [8.77447722226144]
We introduce a novel ternary text classification scheme, adding an "undecided" category for texts that could be attributed to either source.
This research shifts the paradigm from merely classifying to explaining machine-generated texts, emphasizing need for detectors to provide clear and understandable explanations to users.
arXiv Detail & Related papers (2024-06-26T11:11:47Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Raidar: geneRative AI Detection viA Rewriting [42.477151044325595]
Large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting.
We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output.
Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
arXiv Detail & Related papers (2024-01-23T18:57:53Z) - A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions [39.36381851190369]
There is an imperative need to develop detectors that can detect LLM-generated text.
This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content.
The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, statistics-based detectors, neural-base detectors, and human-assisted methods.
arXiv Detail & Related papers (2023-10-23T09:01:13Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.