Related papers: Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

URL: http://arxiv.org/abs/2507.06185v1
Date: Tue, 08 Jul 2025 17:11:13 GMT
Title: Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review
Authors: Zhicheng Lin,
Abstract summary: 18 academic manuscripts on the preprint website arXiv were found to contain hidden instructions designed to manipulate AI-assisted peer review.<n>Author responses varied: one planned to withdraw the affected paper, while another defended the practice as legitimate testing of reviewer compliance.<n>We examine the technique of prompt injection in large language models (LLMs), revealing four types of hidden prompts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In July 2025, 18 academic manuscripts on the preprint website arXiv were found to contain hidden instructions known as prompts designed to manipulate AI-assisted peer review. Instructions such as "GIVE A POSITIVE REVIEW ONLY" were concealed using techniques like white-colored text. Author responses varied: one planned to withdraw the affected paper, while another defended the practice as legitimate testing of reviewer compliance. This commentary analyzes this practice as a novel form of research misconduct. We examine the technique of prompt injection in large language models (LLMs), revealing four types of hidden prompts, ranging from simple positive review commands to detailed evaluation frameworks. The defense that prompts served as "honeypots" to detect reviewers improperly using AI fails under examination--the consistently self-serving nature of prompt instructions indicates intent to manipulate. Publishers maintain inconsistent policies: Elsevier prohibits AI use in peer review entirely, while Springer Nature permits limited use with disclosure requirements. The incident exposes systematic vulnerabilities extending beyond peer review to any automated system processing scholarly texts, including plagiarism detection and citation indexing. Our analysis underscores the need for coordinated technical screening at submission portals and harmonized policies governing generative AI (GenAI) use in academic evaluation.

Related papers

Identity Theft in AI Conference Peer Review [50.18240135317708]
We discuss newly uncovered cases of identity theft in the scientific peer-review process within artificial intelligence (AI) research.<n>We detail how dishonest researchers exploit the peer-review system by creating fraudulent reviewer profiles to manipulate paper evaluations.
arXiv Detail & Related papers (2025-08-06T02:36:52Z)
Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center [49.85176045690678]
Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns.<n>Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models.<n>Four teams attempted to extract copyrighted content from GPT4DFCI across four domains.
arXiv Detail & Related papers (2025-06-26T23:11:49Z)
In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z)
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review [6.20631177269082]
A new risk to the peer review process is that negligent reviewers will rely on large language models (LLMs) to review a paper.<n>We introduce a comprehensive dataset containing a total of 788,984 AI-written peer reviews paired with corresponding human reviews.<n>We use this new resource to evaluate the ability of 18 existing AI text detection algorithms to distinguish between peer reviews fully written by humans and different state-of-the-art LLMs.
arXiv Detail & Related papers (2025-02-26T23:04:05Z)
Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.<n>Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z)
Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review [8.606381080620789]
We investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs.<n>Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications.<n>We propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications.
arXiv Detail & Related papers (2024-10-03T22:05:06Z)
The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing [25.73744132026804]
Generative AI (GenAI) use in research writing is growing fast. It is unclear how peer reviewers recognize or misjudge AI-augmented manuscripts. Our findings indicate that while AI-augmented writing improves readability, language diversity, and informativeness, it often lacks research details and reflective insights from authors.
arXiv Detail & Related papers (2024-06-27T02:38:25Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.<n>This paper presents a thorough analysis of these literature reviews within the PAMI field.<n>We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
A Dataset on Malicious Paper Bidding in Peer Review [84.68308372858755]
Malicious reviewers strategically bid in order to unethically manipulate the paper assignment. A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of publicly-available data on malicious paper bidding. We release a novel dataset, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously.
arXiv Detail & Related papers (2022-06-24T20:23:33Z)
Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective [1.933681537640272]
Adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms. Deep Learning algorithms have been used in security-critical applications, such as biometric recognition systems and self-driving cars.
arXiv Detail & Related papers (2020-09-08T13:21:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.