Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey
- URL: http://arxiv.org/abs/2310.15264v1
- Date: Mon, 23 Oct 2023 18:11:32 GMT
- Title: Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey
- Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong
Huang, Dinesh Manocha, Amrit Singh Bedi
- Abstract summary: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
- Score: 97.33926242130732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have revolutionized the domain of natural
language processing (NLP) with remarkable capabilities of generating human-like
text responses. However, despite these advancements, several works in the
existing literature have raised serious concerns about the potential misuse of
LLMs such as spreading misinformation, generating fake news, plagiarism in
academia, and contaminating the web. To address these concerns, a consensus
among the research community is to develop algorithmic solutions to detect
AI-generated text. The basic idea is that whenever we can tell if the given
text is either written by a human or an AI, we can utilize this information to
address the above-mentioned concerns. To that end, a plethora of detection
frameworks have been proposed, highlighting the possibilities of AI-generated
text detection. But in parallel to the development of detection frameworks,
researchers have also concentrated on designing strategies to elude detection,
i.e., focusing on the impossibilities of AI-generated text detection. This is a
crucial step in order to make sure the detection frameworks are robust enough
and it is not too easy to fool a detector. Despite the huge interest and the
flurry of research in this domain, the community currently lacks a
comprehensive analysis of recent developments. In this survey, we aim to
provide a concise categorization and overview of current work encompassing both
the prospects and the limitations of AI-generated text detection. To enrich the
collective knowledge, we engage in an exhaustive discussion on critical and
challenging open questions related to ongoing research on AI-generated text
detection.
Related papers
- Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods [13.14749943120523]
Knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness.
State-of-the art approaches to AIGT detection include watermarking, statistical and stylistic analysis, and machine learning classification.
We aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios.
arXiv Detail & Related papers (2024-06-21T18:31:49Z) - Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors [24.954755569786396]
AI-text detection has emerged to distinguish between human and machine-generated content.
Recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts.
Our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors.
arXiv Detail & Related papers (2024-06-13T08:37:01Z) - Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text [4.927763944523323]
Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text.
However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices.
arXiv Detail & Related papers (2024-03-09T01:13:54Z) - A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
and Characterization [13.44566185792894]
AI-generated text forensics is an emerging field addressing the challenges of LLM misuses.
We introduce a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization.
We explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
arXiv Detail & Related papers (2024-03-02T09:39:13Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Testing of Detection Tools for AI-Generated Text [0.0]
The paper examines the functionality of detection tools for artificial intelligence generated text.
It evaluates them based on accuracy and error type analysis.
The research covers 12 publicly available tools and two commercial systems.
arXiv Detail & Related papers (2023-06-21T16:29:44Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.