Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated
Text Detection
- URL: http://arxiv.org/abs/2402.11167v1
- Date: Sat, 17 Feb 2024 02:25:57 GMT
- Title: Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated
Text Detection
- Authors: Fan Huang, Haewoon Kwak, Jisun An
- Abstract summary: This study proposes a novel token-ensemble generation strategy to challenge the robustness of current AI-content detection approaches.
We find the token-ensemble approach significantly drops the performance of AI-content detection models.
- Score: 7.047135911489917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The robustness of AI-content detection models against cultivated attacks
(e.g., paraphrasing or word switching) remains a significant concern. This
study proposes a novel token-ensemble generation strategy to challenge the
robustness of current AI-content detection approaches. We explore the ensemble
attack strategy by completing the prompt with the next token generated from
random candidate LLMs. We find the token-ensemble approach significantly drops
the performance of AI-content detection models (The code and test sets will be
released). Our findings reveal that token-ensemble generation poses a vital
challenge to current detection models and underlines the need for advancing
detection technologies to counter sophisticated adversarial strategies.
Related papers
- Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text [4.902089836908786]
WhosAI is a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI.
We show that our proposed framework achieves outstanding results in both the Turing Test and Authorship tasks.
arXiv Detail & Related papers (2024-07-12T15:44:56Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection [8.149808049643344]
We propose a novel hybrid approach that combines TF-IDF techniques with advanced machine learning models.
Our approach achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2024-06-01T10:21:54Z) - Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack [24.954755569786396]
We propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection.
We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness.
The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content.
arXiv Detail & Related papers (2024-04-02T12:49:22Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - "That Is a Suspicious Reaction!": Interpreting Logits Variation to
Detect NLP Adversarial Attacks [0.2999888908665659]
Adversarial attacks are a major challenge faced by current machine learning research.
Our work presents a model-agnostic detector of adversarial text examples.
arXiv Detail & Related papers (2022-04-10T09:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.