Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review
- URL: http://arxiv.org/abs/2410.03019v2
- Date: Fri, 06 Dec 2024 17:23:53 GMT
- Title: Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review
- Authors: Sungduk Yu, Man Luo, Avinash Madasu, Vasudev Lal, Phillip Howard,
- Abstract summary: We investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs.
Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications.
We propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications.
- Score: 8.606381080620789
- License:
- Abstract: Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.
Related papers
- ReviewEval: An Evaluation Framework for AI-Generated Reviews [9.35023998408983]
This research introduces a comprehensive evaluation framework for AI-generated reviews.
It measures alignment with human evaluations, verifies factual accuracy, assesses analytical depth, and identifies actionable insights.
Our framework establishes standardized metrics for evaluating AI-based review systems.
arXiv Detail & Related papers (2025-02-17T12:22:11Z) - Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.
The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.
We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z) - RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem.
We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt.
We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z) - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [51.453135368388686]
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM)
Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level.
arXiv Detail & Related papers (2024-03-11T21:51:39Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.
This paper presents a thorough analysis of these literature reviews within the PAMI field.
We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as
You May Think -- Introducing AI Detectability Index [9.348082057533325]
AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research.
This paper introduces the Counter Turing Test (CT2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the fragility of existing AGTD techniques.
arXiv Detail & Related papers (2023-10-08T06:20:36Z) - Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to
Document Level [4.250876580245865]
Existing AI-generated text classifiers have limited accuracy and often produce false positives.
We propose a novel approach using natural language processing (NLP) techniques.
We generate multiple paraphrased versions of a given question and inputting them into the large language model to generate answers.
By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response.
arXiv Detail & Related papers (2023-06-13T20:34:55Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - A Dataset on Malicious Paper Bidding in Peer Review [84.68308372858755]
Malicious reviewers strategically bid in order to unethically manipulate the paper assignment.
A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of publicly-available data on malicious paper bidding.
We release a novel dataset, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously.
arXiv Detail & Related papers (2022-06-24T20:23:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.