Related papers: SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis

SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis

URL: http://arxiv.org/abs/2508.11343v2
Date: Mon, 18 Aug 2025 03:05:38 GMT
Title: SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis
Authors: Haitong Luo, Weiyao Zhang, Suhang Wang, Wenji Zou, Chungang Lin, Xuying Meng, Yujun Zhang,
Abstract summary: We introduce a novel paradigm that analyzes the sequence of token log-probabilities in the frequency domain.<n>We construct SpecDetect, a detector built on a single, robust feature from the global DFT: DFT total energy.<n>Our work introduces a new, efficient, and interpretable pathway for LLM-generated text detection, showing that classical signal processing techniques offer a surprisingly powerful solution to this modern challenge.
Score: 31.43564106945543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of high-quality text from Large Language Models (LLMs) demands reliable and efficient detection methods. While existing training-free approaches show promise, they often rely on surface-level statistics and overlook fundamental signal properties of the text generation process. In this work, we reframe detection as a signal processing problem, introducing a novel paradigm that analyzes the sequence of token log-probabilities in the frequency domain. By systematically analyzing the signal's spectral properties using the global Discrete Fourier Transform (DFT) and the local Short-Time Fourier Transform (STFT), we find that human-written text consistently exhibits significantly higher spectral energy. This higher energy reflects the larger-amplitude fluctuations inherent in human writing compared to the suppressed dynamics of LLM-generated text. Based on this key insight, we construct SpecDetect, a detector built on a single, robust feature from the global DFT: DFT total energy. We also propose an enhanced version, SpecDetect++, which incorporates a sampling discrepancy mechanism to further boost robustness. Extensive experiments demonstrate that our approach outperforms the state-of-the-art model while running in nearly half the time. Our work introduces a new, efficient, and interpretable pathway for LLM-generated text detection, showing that classical signal processing techniques offer a surprisingly powerful solution to this modern challenge.

Related papers

Diversity Boosts AI-Generated Text Detection [51.56484100374058]
DivEye is a novel framework that captures how unpredictability fluctuates across a text using surprisal-based features.<n>Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines.
arXiv Detail & Related papers (2025-09-23T10:21:22Z)
LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals [10.85580316542761]
Hallucination remains a critical barrier for deploying large language models (LLMs) in reliability-sensitive applications.<n>We propose HSAD (Hidden Signal Analysis-based Detection), a novel hallucination detection framework that models the temporal dynamics of hidden representations.<n>Across multiple benchmarks, including TruthfulQA, HSAD achieves over 10 percentage points improvement compared to prior state-of-the-art methods.
arXiv Detail & Related papers (2025-09-16T15:08:19Z)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models [60.713908578319256]
We propose Direct Discrepancy Learning (DDL) to optimize the detector with task-oriented knowledge.<n>Built upon this, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance.<n>MIRAGE samples human-written texts from 10 corpora across 5 text-domains, which are then re-generated or revised using 17 cutting-edge LLMs.
arXiv Detail & Related papers (2025-09-15T10:59:57Z)
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z)
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation [58.85645136534301]
Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks.<n>We propose a novel watermarking scheme that improves both detectability and text quality by introducing a cumulative watermark entropy threshold.
arXiv Detail & Related papers (2025-04-16T14:16:38Z)
GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models.<n>We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z)
Scintillation pulse characterization with spectrum-inspired temporal neural networks: case studies on particle detector signals [1.124958340749622]
We propose a network architecture specially tailored for scintillation pulse characterization based on previous works on time series analysis.<n>We prove our idea in two case studies: (a) simulation data generated with the setting of the LUX dark matter detector, and (b) experimental electrical signals with fast electronics to emulate scintillation variations for the NICA/MPD calorimeter.
arXiv Detail & Related papers (2024-10-09T02:44:53Z)
Training-free LLM-generated Text Detection by Mining Token Probability Sequences [18.955509967889782]
Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains. Training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability. We introduce a novel training-free detector, termed textbfLastde that synergizes local and global statistics for enhanced detection.
arXiv Detail & Related papers (2024-10-08T14:23:45Z)
Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images. Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries. We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z)
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text [82.5469544192645]
We propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT) By analyzing the differences between the original and new remaining parts through N-gram analysis, we unveil significant discrepancies between the distribution of machine-generated text and human-written text. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text.
arXiv Detail & Related papers (2023-05-27T03:58:29Z)
G3Detector: General GPT-Generated Text Detector [26.47122201110071]
We introduce an unpretentious yet potent detection approach proficient in identifying synthetic text across a wide array of fields. Our detector demonstrates outstanding performance uniformly across various model architectures and decoding strategies. It also possesses the capability to identify text generated utilizing a potent detection-evasion technique.
arXiv Detail & Related papers (2023-05-22T03:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.