The Science of Detecting LLM-Generated Texts
- URL: http://arxiv.org/abs/2303.07205v3
- Date: Fri, 2 Jun 2023 19:24:17 GMT
- Title: The Science of Detecting LLM-Generated Texts
- Authors: Ruixiang Tang, Yu-Neng Chuang, Xia Hu
- Abstract summary: The emergence of large language models (LLMs) has resulted in the production of texts that are almost indistinguishable from texts written by humans.
This has sparked concerns about the potential misuse of such texts, such as spreading misinformation and causing disruptions in the education system.
This survey aims to provide an overview of existing LLM-generated text detection techniques and enhance the control and regulation of language generation models.
- Score: 47.49470179549773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of large language models (LLMs) has resulted in the production
of LLM-generated texts that is highly sophisticated and almost
indistinguishable from texts written by humans. However, this has also sparked
concerns about the potential misuse of such texts, such as spreading
misinformation and causing disruptions in the education system. Although many
detection approaches have been proposed, a comprehensive understanding of the
achievements and challenges is still lacking. This survey aims to provide an
overview of existing LLM-generated text detection techniques and enhance the
control and regulation of language generation models. Furthermore, we emphasize
crucial considerations for future research, including the development of
comprehensive evaluation metrics and the threat posed by open-source LLMs, to
drive progress in the area of LLM-generated text detection.
Related papers
- TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text [0.0]
This paper explores the ability of large language models to identify boundaries in human-written and machine-generated mixed texts.
Our ensemble model of LLMs achieved first place in the 'Human-Machine Mixed Text Detection' sub-task of the SemEval'24 Competition Task 8.
arXiv Detail & Related papers (2024-04-01T03:54:42Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text Detection [39.254432080406346]
Even task-oriented constraints -- constraints that would naturally be included in an instruction and are not related to detection-evasion -- cause existing powerful detectors to have a large variance in detection performance.
Our experiments show that the standard deviation (SD) of current detector performance on texts generated by an instruction with such a constraint is significantly larger (up to an SD of 14.4 F1-score) than that by generating texts multiple times or paraphrasing the instruction.
arXiv Detail & Related papers (2023-11-14T18:32:52Z) - Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models [49.74036826946397]
This study investigates constrained text generation for large language models (LLMs)
Our research mainly focuses on mainstream open-source LLMs, categorizing constraints into lexical, structural, and relation-based types.
Results illuminate LLMs' capacity and deficiency to incorporate constraints and provide insights for future developments in constrained text generation.
arXiv Detail & Related papers (2023-10-25T03:58:49Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions [39.36381851190369]
There is an imperative need to develop detectors that can detect LLM-generated text.
This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content.
The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, statistics-based detectors, neural-base detectors, and human-assisted methods.
arXiv Detail & Related papers (2023-10-23T09:01:13Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.