T-Detect: Tail-Aware Statistical Normalization for Robust Detection of Adversarial Machine-Generated Text
- URL: http://arxiv.org/abs/2507.23577v2
- Date: Tue, 23 Sep 2025 14:22:04 GMT
- Title: T-Detect: Tail-Aware Statistical Normalization for Robust Detection of Adversarial Machine-Generated Text
- Authors: Alva West, Luodan Zhang, Liuliu Zhang, Minjun Zhu, Yixuan Weng, Yue Zhang,
- Abstract summary: Large language models (LLMs) have shown the capability to generate fluent and logical content, presenting significant challenges to machine-generated text detection.<n>In this paper, we introduce T-Detect, a novel detection method that fundamentally redesigns the curvature-based detectors.<n>Our contributions are a new, theoretically-justified statistical foundation for text detection, an ablation-validated method that demonstrates superior robustness, and a comprehensive analysis of its performance under adversarial conditions.
- Score: 16.983417848762826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown the capability to generate fluent and logical content, presenting significant challenges to machine-generated text detection, particularly text polished by adversarial perturbations such as paraphrasing. Current zero-shot detectors often employ Gaussian distributions as statistical measure for computing detection thresholds, which falters when confronted with the heavy-tailed statistical artifacts characteristic of adversarial or non-native English texts. In this paper, we introduce T-Detect, a novel detection method that fundamentally redesigns the curvature-based detectors. Our primary innovation is the replacement of standard Gaussian normalization with a heavy-tailed discrepancy score derived from the Student's t-distribution. This approach is theoretically grounded in the empirical observation that adversarial texts exhibit significant leptokurtosis, rendering traditional statistical assumptions inadequate. T-Detect computes a detection score by normalizing the log-likelihood of a passage against the expected moments of a t-distribution, providing superior resilience to statistical outliers. We validate our approach on the challenging RAID benchmark for adversarial text and the comprehensive HART dataset. Experiments show that T-Detect provides a consistent performance uplift over strong baselines, improving AUROC by up to 3.9\% in targeted domains. When integrated into a two-dimensional detection framework (CT), our method achieves state-of-the-art performance, with an AUROC of 0.926 on the Books domain of RAID. Our contributions are a new, theoretically-justified statistical foundation for text detection, an ablation-validated method that demonstrates superior robustness, and a comprehensive analysis of its performance under adversarial conditions. Ours code are released at https://github.com/ResearAI/t-detect.
Related papers
- Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection [105.14032334647932]
Machine-generated texts (MGTs) pose risks such as disinformation and phishing, highlighting the need for reliable detection.<n> Metric-based methods, which extract statistically distinguishable features of MGTs, are often more practical than complex model-based methods that are prone to overfitting.<n>We propose a Markov-informed score calibration strategy that models two relationships of context detection scores that may aid calibration.
arXiv Detail & Related papers (2026-02-08T16:06:12Z) - Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z) - On the Adversarial Robustness of Learning-based Conformal Novelty Detection [10.58528988397402]
We study the adversarial robustness of conformal novelty detection using AdaDetect.<n>Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power.
arXiv Detail & Related papers (2025-10-01T03:29:11Z) - RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z) - AI-Generated Text is Non-Stationary: Detection via Temporal Tomography [22.431500389494836]
We introduce Temporal Discrepancy Tomography (TDT), a novel detection paradigm that preserves positional information by reformulating detection as a signal processing task.<n>On the RAID benchmark, TDT achieves 0.855 AUROC (7.1% improvement over the best baseline)<n>Our work establishes non-stationarity as a fundamental characteristic of AI-generated text and demonstrates that preserving temporal dynamics is essential for robust detection.
arXiv Detail & Related papers (2025-08-03T13:43:34Z) - Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding [27.02879006439693]
This work performs a comprehensive empirical study and introduces a benchmark for text anomaly detection.<n>Our work systematically evaluates the effectiveness of embedding-based text anomaly detection.<n>By open-sourcing our benchmark toolkit, this work provides a foundation for future research in robust and scalable text anomaly detection systems.
arXiv Detail & Related papers (2025-07-16T14:47:41Z) - Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.<n>Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.<n> Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z) - Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings [14.150011713654331]
This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT.<n>The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts.
arXiv Detail & Related papers (2025-01-31T10:06:27Z) - Training-free LLM-generated Text Detection by Mining Token Probability Sequences [18.955509967889782]
Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains.
Training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability.
We introduce a novel training-free detector, termed textbfLastde that synergizes local and global statistics for enhanced detection.
arXiv Detail & Related papers (2024-10-08T14:23:45Z) - Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Anomaly Detection with Test Time Augmentation and Consistency Evaluation [13.709281244889691]
We propose a simple, yet effective anomaly detection algorithm named Test Time Augmentation Anomaly Detection (TTA-AD)
We observe that in-distribution data enjoy more consistent predictions for its original and augmented versions on a trained network than out-distribution data.
Experiments on various high-resolution image benchmark datasets demonstrate that TTA-AD achieves comparable or better detection performance.
arXiv Detail & Related papers (2022-06-06T04:27:06Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.