Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey
- URL: http://arxiv.org/abs/2310.15264v1
- Date: Mon, 23 Oct 2023 18:11:32 GMT
- Title: Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey
- Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong
Huang, Dinesh Manocha, Amrit Singh Bedi
- Abstract summary: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
- Score: 97.33926242130732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have revolutionized the domain of natural
language processing (NLP) with remarkable capabilities of generating human-like
text responses. However, despite these advancements, several works in the
existing literature have raised serious concerns about the potential misuse of
LLMs such as spreading misinformation, generating fake news, plagiarism in
academia, and contaminating the web. To address these concerns, a consensus
among the research community is to develop algorithmic solutions to detect
AI-generated text. The basic idea is that whenever we can tell if the given
text is either written by a human or an AI, we can utilize this information to
address the above-mentioned concerns. To that end, a plethora of detection
frameworks have been proposed, highlighting the possibilities of AI-generated
text detection. But in parallel to the development of detection frameworks,
researchers have also concentrated on designing strategies to elude detection,
i.e., focusing on the impossibilities of AI-generated text detection. This is a
crucial step in order to make sure the detection frameworks are robust enough
and it is not too easy to fool a detector. Despite the huge interest and the
flurry of research in this domain, the community currently lacks a
comprehensive analysis of recent developments. In this survey, we aim to
provide a concise categorization and overview of current work encompassing both
the prospects and the limitations of AI-generated text detection. To enrich the
collective knowledge, we engage in an exhaustive discussion on critical and
challenging open questions related to ongoing research on AI-generated text
detection.
Related papers
- Beyond checkmate: exploring the creative chokepoints in AI text [5.427864472511595]
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) and Artificial Intelligence (AI)
Our study investigates the nuanced distinctions between human and AI texts across text segments.
Our research can shed light on the intricacies of human-AI text distinctions, offering novel insights for text detection and understanding.
arXiv Detail & Related papers (2025-01-31T16:57:01Z) - Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods [13.14749943120523]
Knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness.
State-of-the art approaches to AIGT detection include watermarking, statistical and stylistic analysis, and machine learning classification.
We aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios.
arXiv Detail & Related papers (2024-06-21T18:31:49Z) - Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text [4.927763944523323]
Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text.
However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices.
arXiv Detail & Related papers (2024-03-09T01:13:54Z) - A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
and Characterization [13.44566185792894]
AI-generated text forensics is an emerging field addressing the challenges of LLM misuses.
We introduce a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization.
We explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
arXiv Detail & Related papers (2024-03-02T09:39:13Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.