Improving Logits-based Detector without Logits from Black-box LLMs
- URL: http://arxiv.org/abs/2406.05232v2
- Date: Tue, 11 Jun 2024 16:41:52 GMT
- Title: Improving Logits-based Detector without Logits from Black-box LLMs
- Authors: Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, zhiqiang xu, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu,
- Abstract summary: Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
- Score: 56.234109491884126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models. To address these limitations, we present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection even without logits from source LLMs. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations with minimal training investment. By leveraging corpus samples from publicly accessible outputs of advanced models such as ChatGPT, GPT-4 and Claude-3, DALD fine-tunes surrogate models to synchronize with unknown source model distributions effectively.
Related papers
- Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - A Fingerprint for Large Language Models [10.63985246068255]
We propose a novel black-box fingerprinting technique for large language models (LLMs)
Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks.
arXiv Detail & Related papers (2024-07-01T12:25:42Z) - Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression [40.4998607679863]
Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data.
This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps.
arXiv Detail & Related papers (2024-06-17T09:17:40Z) - ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts.
We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z) - Learnable Linguistic Watermarks for Tracing Model Extraction Attacks on Large Language Models [20.44680783275184]
Current watermarking techniques against model extraction attacks rely on signal insertion in model logits or post-processing of generated text.
We propose a novel method for embedding learnable linguistic watermarks in Large Language Models (LLMs)
Our approach subtly modifies the LLM's output distribution by introducing controlled noise into token frequency distributions, embedding a statistically identifiable watermark.
arXiv Detail & Related papers (2024-04-28T14:45:53Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based
LLM [2.8233611508673]
Our study addresses gaps by introducing a novel framework named Diffusion-Enhanced Topic Modeling.
By exploiting the power of diffusion model, our framework also provides the capability to do topic based text generation.
arXiv Detail & Related papers (2023-10-23T19:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.