TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
- URL: http://arxiv.org/abs/2503.08708v2
- Date: Thu, 13 Mar 2025 10:37:18 GMT
- Title: TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
- Authors: Jingyi Zheng, Junfeng Wang, Zhen Sun, Wenhan Dong, Yule Liu, Xinlei He,
- Abstract summary: We introduce TH-Bench, the first comprehensive benchmark to evaluate evading attacks against MGT detectors.<n> TH-Bench evaluates attacks across three key dimensions: evading effectiveness, text quality, and computational overhead.<n>Our findings reveal that no single evading attack excels across all three dimensions.
- Score: 15.533392810111298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Language Models (LLMs) advance, Machine-Generated Texts (MGTs) have become increasingly fluent, high-quality, and informative. Existing wide-range MGT detectors are designed to identify MGTs to prevent the spread of plagiarism and misinformation. However, adversaries attempt to humanize MGTs to evade detection (named evading attacks), which requires only minor modifications to bypass MGT detectors. Unfortunately, existing attacks generally lack a unified and comprehensive evaluation framework, as they are assessed using different experimental settings, model architectures, and datasets. To fill this gap, we introduce the Text-Humanization Benchmark (TH-Bench), the first comprehensive benchmark to evaluate evading attacks against MGT detectors. TH-Bench evaluates attacks across three key dimensions: evading effectiveness, text quality, and computational overhead. Our extensive experiments evaluate 6 state-of-the-art attacks against 13 MGT detectors across 6 datasets, spanning 19 domains and generated by 11 widely used LLMs. Our findings reveal that no single evading attack excels across all three dimensions. Through in-depth analysis, we highlight the strengths and limitations of different attacks. More importantly, we identify a trade-off among three dimensions and propose two optimization insights. Through preliminary experiments, we validate their correctness and effectiveness, offering potential directions for future research.
Related papers
- Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training [13.239171999837287]
We introduce an adversarial framework for training a robust MGT detector, named GREedy Adversary PromoTed DefendER (GREATER)
Our experimental results across 10 text perturbation strategies and 6 adversarial attacks show that our GREATER-D reduces the Attack Success Rate (ASR) by 0.67% compared with SOTA defense methods.
arXiv Detail & Related papers (2025-02-18T10:48:53Z) - Stumbling Blocks: Stress Testing the Robustness of Machine-Generated
Text Detectors Under Attacks [48.32116554279759]
We study the robustness of popular machine-generated text detectors under attacks from diverse categories: editing, paraphrasing, prompting, and co-generating.
Our attacks assume limited access to the generator LLMs, and we compare the performance of detectors on different attacks under different budget levels.
Averaging all detectors, the performance drops by 35% across all attacks.
arXiv Detail & Related papers (2024-02-18T16:36:00Z) - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - Authorship Obfuscation in Multilingual Machine-Generated Text Detection [5.658790347990285]
Authorship obfuscation (AO) methods can cause machine-generated text (MGT) detection to evade detection.
We benchmark 10 well-known AO methods against 37 MGT detection methods against MGTs in 11 languages.
The results indicate that all tested AO methods can cause evasion of automated detection in all tested languages, where homoglyph attacks are especially successful.
arXiv Detail & Related papers (2024-01-15T17:57:41Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - A Comprehensive Study of the Robustness for LiDAR-based 3D Object
Detectors against Adversarial Attacks [84.10546708708554]
3D object detectors are increasingly crucial for security-critical tasks.
It is imperative to understand their robustness against adversarial attacks.
This paper presents the first comprehensive evaluation and analysis of the robustness of LiDAR-based 3D detectors under adversarial attacks.
arXiv Detail & Related papers (2022-12-20T13:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.