How well can machine-generated texts be identified and can language
models be trained to avoid identification?
- URL: http://arxiv.org/abs/2310.16992v1
- Date: Wed, 25 Oct 2023 20:43:07 GMT
- Title: How well can machine-generated texts be identified and can language
models be trained to avoid identification?
- Authors: Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo
Rodosek
- Abstract summary: We refine five separate language models to generate synthetic tweets.
We find that shallow learning classification algorithms, like Naive Bayes, achieve detection accuracy between 0.6 and 0.8.
We find that using a reinforcement learning approach to refine our generative models can successfully evade BERT-based classifiers with a detection accuracy of 0.15 or less.
- Score: 1.1606619391009658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rise of generative pre-trained transformer models such as GPT-3,
GPT-NeoX, or OPT, distinguishing human-generated texts from machine-generated
ones has become important. We refined five separate language models to generate
synthetic tweets, uncovering that shallow learning classification algorithms,
like Naive Bayes, achieve detection accuracy between 0.6 and 0.8.
Shallow learning classifiers differ from human-based detection, especially
when using higher temperature values during text generation, resulting in a
lower detection rate. Humans prioritize linguistic acceptability, which tends
to be higher at lower temperature values. In contrast, transformer-based
classifiers have an accuracy of 0.9 and above. We found that using a
reinforcement learning approach to refine our generative models can
successfully evade BERT-based classifiers with a detection accuracy of 0.15 or
less.
Related papers
- Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection [0.0]
We study the problem of detecting machine-generated text when the large language model it is possibly derived from is unknown.
We use a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) language model is the same as the discriminative (or scoring) language model.
arXiv Detail & Related papers (2024-06-18T12:58:01Z) - Large Language Model (LLM) AI text generation detection based on transformer deep learning algorithm [0.9004420912552793]
A tool for detecting AI text generation is developed on the Transformer model.
Deep learning model combines layers such as LSTM, Transformer and CNN for text classification or sequence labelling tasks.
The model has 99% prediction accuracy for AI-generated text, with a precision of 0.99, a recall of 1, and an f1 score of 0.99, achieving a very high classification accuracy.
arXiv Detail & Related papers (2024-04-06T06:22:45Z) - Smaller Language Models are Better Black-box Machine-Generated Text
Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors.
We find that whether the detector and generator were trained on the same data is not critically important to the detection success.
For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability
Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function.
We then define a new curvature-based criterion for judging if a passage is generated from a given LLM.
We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z) - Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z) - DIALOG-22 RuATD Generated Text Detection [0.0]
Detectors that can distinguish between TGM-generated text and human-written ones play an important role in preventing abuse of TGM.
We describe our pipeline for the two DIALOG-22 RuATD tasks: detecting generated text (binary task) and classification of which model was used to generate text.
arXiv Detail & Related papers (2022-06-16T09:33:26Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.