Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
- URL: http://arxiv.org/abs/2408.08946v1
- Date: Fri, 16 Aug 2024 17:58:49 GMT
- Title: Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
- Authors: Baixiang Huang, Canyu Chen, Kai Shu,
- Abstract summary: The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship.
This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field.
- Score: 16.35265384114857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship, posing significant challenges for traditional methods. We presents a comprehensive literature review that examines the latest research on authorship attribution in the era of LLMs. This survey systematically explores the landscape of this field by categorizing four representative problems: (1) Human-written Text Attribution; (2) LLM-generated Text Detection; (3) LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution. We also discuss the challenges related to ensuring the generalization and explainability of authorship attribution methods. Generalization requires the ability to generalize across various domains, while explainability emphasizes providing transparent and understandable insights into the decisions made by these models. By evaluating the strengths and limitations of existing methods and benchmarks, we identify key open problems and future research directions in this field. This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field. Additional resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework [9.976099891796784]
Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement.
Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effectively identify LLM-generated text in academic contexts.
We propose a novel Multi-level Fine-grained Detection framework that detects LLM-generated text by integrating low-level structural, high-level semantic, and deep-level linguistic features.
arXiv Detail & Related papers (2024-10-18T07:25:00Z) - Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges [7.140449861888235]
This review categorizes works in text generation into five main tasks.
For each task, we review their relevant characteristics, sub-tasks, and specific challenges.
Our investigation shows nine prominent challenges common to all tasks and sub-tasks in recent text generation publications.
arXiv Detail & Related papers (2024-05-24T14:38:11Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - Authenticity in Authorship: The Writer's Integrity Framework for Verifying Human-Generated Text [0.0]
"Writer's Integrity" framework monitors the writing process, rather than the product, capturing the distinct behavioral footprint of human authorship.
We highlight its potential in revolutionizing the validation of human intellectual work, emphasizing its role in upholding academic integrity and intellectual property rights.
This paper outlines a business model for tech companies to monetize the framework effectively.
arXiv Detail & Related papers (2024-04-05T23:00:34Z) - Can Large Language Models Identify Authorship? [16.35265384114857]
Large Language Models (LLMs) have demonstrated an exceptional capacity for reasoning and problem-solving.
This work seeks to address three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively?
(2) Are LLMs capable of accurately attributing authorship among multiple candidates authors (e.g., 10 and 20)?
arXiv Detail & Related papers (2024-03-13T03:22:02Z) - A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
and Characterization [13.44566185792894]
AI-generated text forensics is an emerging field addressing the challenges of LLM misuses.
We introduce a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization.
We explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
arXiv Detail & Related papers (2024-03-02T09:39:13Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.