Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction
- URL: http://arxiv.org/abs/2505.05084v2
- Date: Wed, 14 May 2025 04:38:15 GMT
- Title: Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction
- Authors: Xiaowei Zhu, Yubing Ren, Yanan Cao, Xixun Lin, Fang Fang, Yangxi Li,
- Abstract summary: Most existing detection methods focus excessively on detection accuracy, neglecting the societal risks posed by high false positive rates (FPRs)<n>This paper addresses this issue by leveraging Conformal Prediction (CP), which effectively constrains the upper bound of FPRs.<n>To overcome this trade-off, this paper proposes a Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction (MCP)
- Score: 18.76704882926895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of large language models has raised significant concerns regarding their potential misuse by malicious actors. As a result, developing effective detectors to mitigate these risks has become a critical priority. However, most existing detection methods focus excessively on detection accuracy, often neglecting the societal risks posed by high false positive rates (FPRs). This paper addresses this issue by leveraging Conformal Prediction (CP), which effectively constrains the upper bound of FPRs. While directly applying CP constrains FPRs, it also leads to a significant reduction in detection performance. To overcome this trade-off, this paper proposes a Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction (MCP), which both enforces the FPR constraint and improves detection performance. This paper also introduces RealDet, a high-quality dataset that spans a wide range of domains, ensuring realistic calibration and enabling superior detection performance when combined with MCP. Empirical evaluations demonstrate that MCP effectively constrains FPRs, significantly enhances detection performance, and increases robustness against adversarial attacks across multiple detectors and datasets.
Related papers
- T-Detect: Tail-Aware Statistical Normalization for Robust Detection of Adversarial Machine-Generated Text [15.880428198252046]
Existing zero-shot detectors often rely on statistical measures that implicitly assume Gaussian distributions.<n>This paper introduces T-Detect, a novel detection method that fundamentally redesigns the statistical core of curvature-based detectors.<n>Our contributions are a new, theoretically-justified statistical foundation for text detection, an ablation-validated method that demonstrates superior robustness, and a comprehensive analysis of its performance under adversarial conditions.
arXiv Detail & Related papers (2025-07-31T14:08:04Z) - Attack Effect Model based Malicious Behavior Detection [14.402324888945815]
We present FEAD (Focus-Enhanced Attack Detection), a framework that addresses the three key challenges of traditional security detection methods.<n>We present an attack model-driven approach that extracts security-critical monitoring items from online attack reports for comprehensive coverage.<n>We also show that FEAD achieves 8.23% higher F1-score than existing solutions with only 5.4% overhead.
arXiv Detail & Related papers (2025-06-05T13:10:58Z) - Adaptive Scoring and Thresholding with Human Feedback for Robust Out-of-Distribution Detection [6.192472816262214]
Machine Learning (ML) models are trained on in-distribution (ID) data but often encounter out-of-distribution (OOD) inputs during deployment.<n>Recent works have focused on designing scoring functions to quantify OOD uncertainty.<n>We propose a human-in-the-loop framework that emphsafely updates both scoring functions and thresholds on the fly based on real-world OOD inputs.
arXiv Detail & Related papers (2025-05-05T00:25:14Z) - A Practical Examination of AI-Generated Text Detectors for Large Language Models [25.919278893876193]
Machine-generated content detectors claim to identify such text under various conditions and from any language model.<n>This paper critically evaluates these claims by assessing several popular detectors on a range of domains, datasets, and models that these detectors have not previously encountered.
arXiv Detail & Related papers (2024-12-06T15:56:11Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Improving Adversarial Robustness of Masked Autoencoders via Test-time
Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE)
A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods.
We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision.
Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z) - Probabilistic Ranking-Aware Ensembles for Enhanced Object Detections [50.096540945099704]
We propose a novel ensemble called the Probabilistic Ranking Aware Ensemble (PRAE) that refines the confidence of bounding boxes from detectors.
We also introduce a bandit approach to address the confidence imbalance problem caused by the need to deal with different numbers of boxes.
arXiv Detail & Related papers (2021-05-07T09:37:06Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.