Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods
- URL: http://arxiv.org/abs/2406.15583v1
- Date: Fri, 21 Jun 2024 18:31:49 GMT
- Title: Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods
- Authors: Kathleen C. Fraser, Hillary Dawkins, Svetlana Kiritchenko,
- Abstract summary: Knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness.
State-of-the art approaches to AIGT detection include watermarking, statistical and stylistic analysis, and machine learning classification.
We aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios.
- Score: 13.14749943120523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have advanced to a point that even humans have difficulty discerning whether a text was generated by another human, or by a computer. However, knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness, and has applications in many domains including detecting fraud and academic dishonesty, as well as combating the spread of misinformation and political propaganda. The task of AI-generated text (AIGT) detection is therefore both very challenging, and highly critical. In this survey, we summarize state-of-the art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.
Related papers
- Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection [44.05134959039957]
We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors.
Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects.
These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups.
arXiv Detail & Related papers (2025-02-18T07:49:31Z) - Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.
We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.
We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z) - A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
and Characterization [13.44566185792894]
AI-generated text forensics is an emerging field addressing the challenges of LLM misuses.
We introduce a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization.
We explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
arXiv Detail & Related papers (2024-03-02T09:39:13Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Who Said That? Benchmarking Social Media AI Detection [12.862865254507177]
This paper introduces SAID (Social media AI Detection), a novel benchmark developed to assess AI-text detection models' capabilities in real social media platforms.
It incorporates real AI-generate text from popular social media platforms like Zhihu and Quora.
A notable finding of our study, based on the Zhihu dataset, reveals that annotators can distinguish between AI-generated and human-generated texts with an average accuracy rate of 96.5%.
arXiv Detail & Related papers (2023-10-12T11:35:24Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Detection of Fake Generated Scientific Abstracts [0.9525711971667679]
The academic community has expressed concerns regarding the difficulty of discriminating between what is real and what is artificially generated.
In this study, we utilize the GPT-3 model to generate scientific paper abstracts through Artificial Intelligence.
We explore various text representation methods when combined with Machine Learning models with the aim of identifying machine-written text.
arXiv Detail & Related papers (2023-04-12T20:20:22Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.