A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
and Characterization
- URL: http://arxiv.org/abs/2403.01152v1
- Date: Sat, 2 Mar 2024 09:39:13 GMT
- Title: A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
and Characterization
- Authors: Tharindu Kumarage, Garima Agrawal, Paras Sheth, Raha Moraffah, Aman
Chadha, Joshua Garland, Huan Liu
- Abstract summary: AI-generated text forensics is an emerging field addressing the challenges of LLM misuses.
We introduce a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization.
We explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
- Score: 13.44566185792894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We have witnessed lately a rapid proliferation of advanced Large Language
Models (LLMs) capable of generating high-quality text. While these LLMs have
revolutionized text generation across various domains, they also pose
significant risks to the information ecosystem, such as the potential for
generating convincing propaganda, misinformation, and disinformation at scale.
This paper offers a review of AI-generated text forensic systems, an emerging
field addressing the challenges of LLM misuses. We present an overview of the
existing efforts in AI-generated text forensics by introducing a detailed
taxonomy, focusing on three primary pillars: detection, attribution, and
characterization. These pillars enable a practical understanding of
AI-generated text, from identifying AI-generated content (detection),
determining the specific AI model involved (attribution), and grouping the
underlying intents of the text (characterization). Furthermore, we explore
available resources for AI-generated text forensics research and discuss the
evolving challenges and future directions of forensic systems in an AI era.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated [8.77447722226144]
We introduce a novel ternary text classification scheme, adding an "undecided" category for texts that could be attributed to either source.
This research shifts the paradigm from merely classifying to explaining machine-generated texts, emphasizing need for detectors to provide clear and understandable explanations to users.
arXiv Detail & Related papers (2024-06-26T11:11:47Z) - Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods [13.14749943120523]
Knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness.
State-of-the art approaches to AIGT detection include watermarking, statistical and stylistic analysis, and machine learning classification.
We aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios.
arXiv Detail & Related papers (2024-06-21T18:31:49Z) - Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection [8.149808049643344]
We propose a novel hybrid approach that combines TF-IDF techniques with advanced machine learning models.
Our approach achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2024-06-01T10:21:54Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Neural Authorship Attribution: Stylometric Analysis on Large Language
Models [16.63955074133222]
Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text.
With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics.
arXiv Detail & Related papers (2023-08-14T17:46:52Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.