Your Large Language Models Are Leaving Fingerprints
- URL: http://arxiv.org/abs/2405.14057v1
- Date: Wed, 22 May 2024 23:02:42 GMT
- Title: Your Large Language Models Are Leaving Fingerprints
- Authors: Hope McGovern, Rickard Stureborg, Yoshi Suhara, Dimitris Alikaniotis,
- Abstract summary: LLMs possess unique fingerprints that manifest as slight differences in the frequency of certain lexical and morphosyntactic features.
We show how to visualize such fingerprints, describe how they can be used to detect machine-generated text and find that they are even robust across textual domains.
- Score: 1.9561775591923982
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been shown that finetuned transformers and other supervised detectors effectively distinguish between human and machine-generated text in some situations arXiv:2305.13242, but we find that even simple classifiers on top of n-gram and part-of-speech features can achieve very robust performance on both in- and out-of-domain data. To understand how this is possible, we analyze machine-generated output text in five datasets, finding that LLMs possess unique fingerprints that manifest as slight differences in the frequency of certain lexical and morphosyntactic features. We show how to visualize such fingerprints, describe how they can be used to detect machine-generated text and find that they are even robust across textual domains. We find that fingerprints are often persistent across models in the same model family (e.g. llama-13b vs. llama-65b) and that models fine-tuned for chat are easier to detect than standard language models, indicating that LLM fingerprints may be directly induced by the training data.
Related papers
- RKadiyala at SemEval-2024 Task 8: Black-Box Word-Level Text Boundary Detection in Partially Machine Generated Texts [0.0]
This paper introduces few reliable approaches for identifying which part of a given text is machine generated at a word level.
We present a comparison with proprietary systems, performance of our model on unseen domains' and generators' texts.
The findings reveal significant improvements in detection accuracy along with comparison on other aspects of detection capabilities.
arXiv Detail & Related papers (2024-10-22T03:21:59Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text [98.28130949052313]
A score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text.
We propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs.
The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
arXiv Detail & Related papers (2024-01-22T16:09:47Z) - Few-Shot Detection of Machine-Generated Text using Style Representations [4.326503887981912]
Language models that convincingly mimic human writing pose a significant risk of abuse.
We propose to leverage representations of writing style estimated from human-authored text.
We find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors.
arXiv Detail & Related papers (2024-01-12T17:26:51Z) - M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box
Machine-Generated Text Detection [69.29017069438228]
Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries.
This has also raised concerns about the potential misuse of such texts in journalism, education, and academia.
In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse.
arXiv Detail & Related papers (2023-05-24T08:55:11Z) - Smaller Language Models are Better Black-box Machine-Generated Text
Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors.
We find that whether the detector and generator were trained on the same data is not critically important to the detection success.
For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text.
Our approach represents the factual structure of a given document as an entity graph.
Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.