Ten Words Only Still Help: Improving Black-Box AI-Generated Text
Detection via Proxy-Guided Efficient Re-Sampling
- URL: http://arxiv.org/abs/2402.09199v1
- Date: Wed, 14 Feb 2024 14:32:16 GMT
- Title: Ten Words Only Still Help: Improving Black-Box AI-Generated Text
Detection via Proxy-Guided Efficient Re-Sampling
- Authors: Yuhui Shi, Qiang Sheng, Juan Cao, Hao Mi, Beizhe Hu, Danding Wang
- Abstract summary: POGER is a proxy-guided efficient re-sampling method for black-box AIGT detection.
It outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings.
- Score: 19.780068724002888
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the rapidly increasing application of large language models (LLMs),
their abuse has caused many undesirable societal problems such as fake news,
academic dishonesty, and information pollution. This makes AI-generated text
(AIGT) detection of great importance. Among existing methods, white-box methods
are generally superior to black-box methods in terms of performance and
generalizability, but they require access to LLMs' internal states and are not
applicable to black-box settings. In this paper, we propose to estimate word
generation probabilities as pseudo white-box features via multiple re-sampling
to help improve AIGT detection under the black-box setting. Specifically, we
design POGER, a proxy-guided efficient re-sampling method, which selects a
small subset of representative words (e.g., 10 words) for performing multiple
re-sampling in black-box AIGT detection. Experiments on datasets containing
texts from humans and seven LLMs show that POGER outperforms all baselines in
macro F1 under black-box, partial white-box, and out-of-distribution settings
and maintains lower re-sampling costs than its existing counterparts.
Related papers
- Diversity Boosts AI-Generated Text Detection [51.56484100374058]
DivEye is a novel framework that captures how unpredictability fluctuates across a text using surprisal-based features.<n>Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines.
arXiv Detail & Related papers (2025-09-23T10:21:22Z) - RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z) - LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4 [0.9619839248195652]
We introduce LeanTree, a tool built in the Lean 4 language that factorizes complex proof states into simpler, independent branches.<n>Our preliminary results hint that white-box approaches outperform black-box alternatives in some settings.
arXiv Detail & Related papers (2025-07-19T18:50:07Z) - Automated Detection of Pre-training Text in Black-box LLMs [11.227481657336385]
VeilProbe is a framework for automatically detecting pre-training texts in a black-box setting without human intervention.<n>It infers the latent mapping feature between the input text and the corresponding output suffix generated by the Large Language Models.<n>It performs the key token perturbations to obtain more distinguishable membership features.
arXiv Detail & Related papers (2025-06-24T08:08:15Z) - Towards General Visual-Linguistic Face Forgery Detection(V2) [90.6600794602029]
Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust.
Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection.
We propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification.
arXiv Detail & Related papers (2025-02-28T04:15:36Z) - TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models.
To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD.
Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o.
To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z) - Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection [15.902823469821431]
**Glimpse** is a probability distribution estimation approach, predicting the full distributions from partial observations.
We extend white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models.
Experiments show that Glimpse with Fast-DetectGPT and GPT-3.5 achieves an average AUROC of about 0.95 in five latest source models.
arXiv Detail & Related papers (2024-12-16T07:28:36Z) - Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple but effective black-box zero-shot detection approach.
It is predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.
Our method achieves an average AUROC of 98.7% and shows strong robustness against paraphrase and adversarial perturbation attacks.
arXiv Detail & Related papers (2024-05-07T12:57:01Z) - Survival of the Most Influential Prompts: Efficient Black-Box Prompt
Search via Clustering and Pruning [77.61565726647784]
We propose a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens.
Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.
arXiv Detail & Related papers (2023-10-19T14:25:06Z) - Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text
via Conditional Probability Curvature [36.31281981509264]
Large language models (LLMs) have shown the ability to produce fluent and cogent content.
To build trustworthy AI systems, it is imperative to distinguish between machine-generated and human-authored content.
Fast-DetectGPT is an optimized zero-shot detector that substitutes DetectGPT's perturbation step with a more efficient sampling step.
arXiv Detail & Related papers (2023-10-08T11:41:28Z) - Language Models as Black-Box Optimizers for Vision-Language Models [62.80817942316398]
Vision-language models (VLMs) pre-trained on web-scale datasets have demonstrated remarkable capabilities on downstream tasks when fine-tuned with minimal data.
We aim to develop a black-box approach to optimize VLMs through natural language prompts.
arXiv Detail & Related papers (2023-09-12T04:03:41Z) - Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to
Document Level [4.250876580245865]
Existing AI-generated text classifiers have limited accuracy and often produce false positives.
We propose a novel approach using natural language processing (NLP) techniques.
We generate multiple paraphrased versions of a given question and inputting them into the large language model to generate answers.
By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response.
arXiv Detail & Related papers (2023-06-13T20:34:55Z) - DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of
GPT-Generated Text [82.5469544192645]
We propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT)
By analyzing the differences between the original and new remaining parts through N-gram analysis, we unveil significant discrepancies between the distribution of machine-generated text and human-written text.
Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text.
arXiv Detail & Related papers (2023-05-27T03:58:29Z) - Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data
Augmentation [42.05617728412819]
We show how to optimize few-shot text classification without accessing the gradients of the large-scale language models.
Our approach, dubbed BT-Classifier, significantly outperforms state-of-the-art black-box few-shot learners.
arXiv Detail & Related papers (2023-05-23T07:54:34Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - PromptBoosting: Black-Box Text Classification with Ten Forward Passes [61.38341243907045]
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations.
Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
arXiv Detail & Related papers (2022-12-19T06:04:54Z) - LAMBDA: Covering the Solution Set of Black-Box Inequality by Search
Space Quantization [1.345821655503426]
Black-box functions are broadly used to model complex problems that provide no explicit information but the input and output.
Covering as much as possible of the solution set through limited evaluations to the black-box objective function is defined as the Black-Box Coverage (BBC) problem.
arXiv Detail & Related papers (2022-03-25T15:24:05Z) - Text Counterfactuals via Latent Optimization and Shapley-Guided Search [15.919650185010491]
We study the problem of generating counterfactual text for a classification model.
We aim to minimally alter the text to change the model's prediction.
White-box approaches have been successfully applied to similar problems in vision.
arXiv Detail & Related papers (2021-10-22T05:04:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.