Related papers: How well can machine-generated texts be identified and can language models be trained to avoid identification?

How well can machine-generated texts be identified and can language models be trained to avoid identification?

URL: http://arxiv.org/abs/2310.16992v1
Date: Wed, 25 Oct 2023 20:43:07 GMT
Title: How well can machine-generated texts be identified and can language models be trained to avoid identification?
Authors: Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek
Abstract summary: We refine five separate language models to generate synthetic tweets. We find that shallow learning classification algorithms, like Naive Bayes, achieve detection accuracy between 0.6 and 0.8. We find that using a reinforcement learning approach to refine our generative models can successfully evade BERT-based classifiers with a detection accuracy of 0.15 or less.
Score: 1.1606619391009658
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rise of generative pre-trained transformer models such as GPT-3, GPT-NeoX, or OPT, distinguishing human-generated texts from machine-generated ones has become important. We refined five separate language models to generate synthetic tweets, uncovering that shallow learning classification algorithms, like Naive Bayes, achieve detection accuracy between 0.6 and 0.8. Shallow learning classifiers differ from human-based detection, especially when using higher temperature values during text generation, resulting in a lower detection rate. Humans prioritize linguistic acceptability, which tends to be higher at lower temperature values. In contrast, transformer-based classifiers have an accuracy of 0.9 and above. We found that using a reinforcement learning approach to refine our generative models can successfully evade BERT-based classifiers with a detection accuracy of 0.15 or less.

Related papers

Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [60.09665704993751]
We introduce FairOPT, an algorithm for group-specific threshold optimization in AI-generated content classifiers. Our approach partitions data into subgroups based on attributes (e.g., text length and writing style) and learns decision thresholds for each group. Our framework paves the way for more robust and fair classification criteria in AI-generated output detection.
arXiv Detail & Related papers (2025-02-06T21:58:48Z)
Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection [0.0]
We study the problem of detecting machine-generated text when the large language model it is possibly derived from is unknown. We use a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) language model is the same as the discriminative (or scoring) language model.
arXiv Detail & Related papers (2024-06-18T12:58:01Z)
Large Language Model (LLM) AI text generation detection based on transformer deep learning algorithm [0.9004420912552793]
A tool for detecting AI text generation is developed on the Transformer model. Deep learning model combines layers such as LSTM, Transformer and CNN for text classification or sequence labelling tasks. The model has 99% prediction accuracy for AI-generated text, with a precision of 0.99, a recall of 1, and an f1 score of 0.99, achieving a very high classification accuracy.
arXiv Detail & Related papers (2024-04-06T06:22:45Z)
Smaller Language Models are Better Black-box Machine-Generated Text Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors. We find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z)
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking. We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z)
Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z)
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function. We then define a new curvature-based criterion for judging if a passage is generated from a given LLM. We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z)
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z)
DIALOG-22 RuATD Generated Text Detection [0.0]
Detectors that can distinguish between TGM-generated text and human-written ones play an important role in preventing abuse of TGM. We describe our pipeline for the two DIALOG-22 RuATD tasks: detecting generated text (binary task) and classification of which model was used to generate text.
arXiv Detail & Related papers (2022-06-16T09:33:26Z)
Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.