MOSAIC: Multiple Observers Spotting AI Content
- URL: http://arxiv.org/abs/2409.07615v3
- Date: Wed, 11 Jun 2025 10:13:33 GMT
- Title: MOSAIC: Multiple Observers Spotting AI Content
- Authors: Matthieu Dubois, François Yvon, Pablo Piantanida,
- Abstract summary: Large Language Models (LLMs) are trained at scale and endowed with powerful text-generating abilities.<n>We propose an approach to automatically discriminate artificially generated from human-written texts.<n>Our experiments, conducted with various generator LLMs, indicate that this approach effectively leverages the strengths of each model.
- Score: 35.67613230687864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dissemination of Large Language Models (LLMs), trained at scale, and endowed with powerful text-generating abilities, has made it easier for all to produce harmful, toxic, faked or forged content. In response, various proposals have been made to automatically discriminate artificially generated from human-written texts, typically framing the problem as a binary classification problem. Early approaches evaluate an input document with a well-chosen detector LLM, assuming that low-perplexity scores reliably signal machine-made content. More recent systems instead consider two LLMs and compare their probability distributions over the document to further discriminate when perplexity alone cannot. However, using a fixed pair of models can induce brittleness in performance. We extend these approaches to the ensembling of several LLMs and derive a new, theoretically grounded approach to combine their respective strengths. Our experiments, conducted with various generator LLMs, indicate that this approach effectively leverages the strengths of each model, resulting in robust detection performance across multiple domains. Our code and data are available at https://github.com/BaggerOfWords/MOSAIC .
Related papers
- Token Level Hallucination Detection via Variance in Language Models [0.0]
Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations.<n>We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple generations.<n>Our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis.
arXiv Detail & Related papers (2025-07-05T19:20:59Z) - mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection [0.0]
An automated detection is able to assist humans to indicate the machine-generated texts.<n>This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification.<n>It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025.
arXiv Detail & Related papers (2025-06-02T14:07:32Z) - Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z) - Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z) - AuthorMist: Evading AI Text Detectors with Reinforcement Learning [4.806579822134391]
AuthorMist is a novel reinforcement learning-based system to transform AI-generated text into human-like writing.
We show that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning.
arXiv Detail & Related papers (2025-03-10T12:41:05Z) - Towards General Visual-Linguistic Face Forgery Detection(V2) [90.6600794602029]
Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust.<n>Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection.<n>We propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification.
arXiv Detail & Related papers (2025-02-28T04:15:36Z) - Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z) - Robust Detection of LLM-Generated Text: A Comparative Analysis [0.276240219662896]
Large language models can be widely integrated into many aspects of life, and their output can quickly fill all network resources.
It becomes increasingly important to develop powerful detectors for the generated text.
This detector is essential to prevent the potential misuse of these technologies and to protect areas such as social media from the negative effects.
arXiv Detail & Related papers (2024-11-09T18:27:15Z) - DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios [38.952481877244644]
We present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task.
Our development of DetectRL reveals the strengths and limitations of current SOTA detectors.
We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios.
arXiv Detail & Related papers (2024-10-31T09:01:25Z) - GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck.
Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts.
Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z) - DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination [1.8418334324753884]
This paper introduces back-translation as a novel technique for evading detection.
We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text.
We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems.
arXiv Detail & Related papers (2024-09-22T01:13:22Z) - DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z) - Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations.
This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators.
The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z) - Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple but effective black-box zero-shot detection approach.
It is predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.
Our method achieves an average AUROC of 98.7% and shows strong robustness against paraphrase and adversarial perturbation attacks.
arXiv Detail & Related papers (2024-05-07T12:57:01Z) - Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text [98.28130949052313]
A score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text.
We propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs.
The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
arXiv Detail & Related papers (2024-01-22T16:09:47Z) - Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text [8.290557547578146]
We introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering.
We find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6% on unseen generators and domains.
arXiv Detail & Related papers (2024-01-17T18:45:13Z) - A Simple yet Efficient Ensemble Approach for AI-generated Text Detection [0.5840089113969194]
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing.
It is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text.
We propose a simple yet efficient solution by ensembling predictions from multiple constituent LLMs.
arXiv Detail & Related papers (2023-11-06T13:11:02Z) - SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs)
We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Large Language Models can be Guided to Evade AI-Generated Text Detection [40.7707919628752]
Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public.
We equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors.
We propose a novel Substitution-based In-Context example optimization method (SICO) to automatically construct prompts for evading the detectors.
arXiv Detail & Related papers (2023-05-18T10:03:25Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.