MGTBench: Benchmarking Machine-Generated Text Detection
- URL: http://arxiv.org/abs/2303.14822v3
- Date: Tue, 16 Jan 2024 02:48:05 GMT
- Title: MGTBench: Benchmarking Machine-Generated Text Detection
- Authors: Xinlei He and Xinyue Shen and Zeyuan Chen and Michael Backes and Yang
Zhang
- Abstract summary: This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
- Score: 54.81446366272403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, powerful large language models (LLMs) such as ChatGPT have
demonstrated revolutionary power in a variety of tasks. Consequently, the
detection of machine-generated texts (MGTs) is becoming increasingly crucial as
LLMs become more advanced and prevalent. These models have the ability to
generate human-like language, making it challenging to discern whether a text
is authored by a human or a machine. This raises concerns regarding
authenticity, accountability, and potential bias. However, existing methods for
detecting MGTs are evaluated using different model architectures, datasets, and
experimental settings, resulting in a lack of a comprehensive evaluation
framework that encompasses various methodologies. Furthermore, it remains
unclear how existing detection methods would perform against powerful LLMs. In
this paper, we fill this gap by proposing the first benchmark framework for MGT
detection against powerful LLMs, named MGTBench. Extensive evaluations on
public datasets with curated texts generated by various powerful LLMs such as
ChatGPT-turbo and Claude demonstrate the effectiveness of different detection
methods. Our ablation study shows that a larger number of words in general
leads to better performance and most detection methods can achieve similar
performance with much fewer training samples. Moreover, we delve into a more
challenging task: text attribution. Our findings indicate that the model-based
detection methods still perform well in the text attribution task. To
investigate the robustness of different detection methods, we consider three
adversarial attacks, namely paraphrasing, random spacing, and adversarial
perturbations. We discover that these attacks can significantly diminish
detection effectiveness, underscoring the critical need for the development of
more robust detection methods.
Related papers
- DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios [38.952481877244644]
We present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task.
Our development of DetectRL reveals the strengths and limitations of current SOTA detectors.
We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios.
arXiv Detail & Related papers (2024-10-31T09:01:25Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - RAFT: Realistic Attacks to Fool Text Detectors [16.749257564123194]
Large language models (LLMs) have exhibited remarkable fluency across various tasks.
Their unethical applications, such as disseminating disinformation, have become a growing concern.
We present RAFT: a grammar error-free black-box attack against existing LLM detectors.
arXiv Detail & Related papers (2024-10-04T17:59:00Z) - ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination [1.8418334324753884]
This paper introduces back-translation as a novel technique for evading detection.
We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text.
We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems.
arXiv Detail & Related papers (2024-09-22T01:13:22Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.
Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.
We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection
with Multimodal Large Language Models [63.946809247201905]
We introduce a new benchmark, namely SHIELD, to evaluate the ability of MLLMs on face spoofing and forgery detection.
We design true/false and multiple-choice questions to evaluate multimodal face data in these two face security tasks.
The results indicate that MLLMs hold substantial potential in the face security domain.
arXiv Detail & Related papers (2024-02-06T17:31:36Z) - OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with
Adversarially Generated Examples [44.118047780553006]
OUTFOX is a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output.
Experiments show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41.3 points F1-score.
The detector shows a state-of-the-art detection performance: up to 96.9 points F1-score, beating existing detectors on non-attacked texts.
arXiv Detail & Related papers (2023-07-21T17:40:47Z) - G3Detector: General GPT-Generated Text Detector [26.47122201110071]
We introduce an unpretentious yet potent detection approach proficient in identifying synthetic text across a wide array of fields.
Our detector demonstrates outstanding performance uniformly across various model architectures and decoding strategies.
It also possesses the capability to identify text generated utilizing a potent detection-evasion technique.
arXiv Detail & Related papers (2023-05-22T03:35:00Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.