Related papers: Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications

Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications

URL: http://arxiv.org/abs/2509.10248v3
Date: Thu, 25 Sep 2025 12:26:58 GMT
Title: Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications
Authors: Janis Keuper,
Abstract summary: This paper investigates the practicability and technical success of the described manipulations.<n>Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs.
Score: 18.33812068961096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ongoing intense discussion on rising LLM usage in the scientific peer-review process has recently been mingled by reports of authors using hidden prompt injections to manipulate review scores. Since the existence of such "attacks" - although seen by some commentators as "self-defense" - would have a great impact on the further debate, this paper investigates the practicability and technical success of the described manipulations. Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs shows two distinct results: I) very simple prompt injections are indeed highly effective, reaching up to 100% acceptance scores. II) LLM reviews are generally biased toward acceptance (>95% in many models). Both results have great impact on the ongoing discussions on LLM usage in peer-review.

Related papers

Do LLMs Favor LLMs? Quantifying Interaction Effects in Peer Review [23.244156664404205]
We provide the first comprehensive analysis of LLM use across the peer review pipeline.<n>We analyze over 125,000 paper-review pairs from ICLR, NeurIPS, and ICML.
arXiv Detail & Related papers (2026-01-28T18:50:54Z)
Gen-Review: A Large-scale Dataset of AI-Generated (and Human-written) Peer Reviews [7.138338798002387]
We present GenReview, the largest dataset containing LLM-written reviews.<n>Our dataset includes 81K reviews generated for all submissions to the 2018--2025 editions of the ICLR.<n>To illustrate the value of GenReview, we explore a sample of intriguing research questions.
arXiv Detail & Related papers (2025-10-24T06:54:27Z)
LLM-REVal: Can We Trust LLM Reviewers Yet? [70.58742663985652]
Large language models (LLMs) have inspired researchers to integrate them extensively into the academic workflow.<n>This study focuses on how the deep integration of LLMs into both peer-review and research processes may influence scholarly fairness.
arXiv Detail & Related papers (2025-10-14T10:30:20Z)
When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review [34.067892820832405]
This paper presents a systematic evaluation of large language models (LLMs) as academic reviewers.<n>Using a curated dataset of 1,441 papers from ICLR 2023 and NeurIPS 2022, we evaluate GPT-5-mini against human reviewers across ratings, strengths, and weaknesses.<n>Our findings show that LLMs consistently inflate ratings for weaker papers while aligning more closely with human judgments on stronger contributions.
arXiv Detail & Related papers (2025-09-12T00:57:50Z)
Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review [17.869642243653985]
Large Language Models (LLMs) are increasingly being integrated into the scientific peer-review process.<n>We investigate the potential for hidden prompt injection attacks, where authors embed adversarial text within a paper's PDF.
arXiv Detail & Related papers (2025-08-28T14:57:04Z)
Detecting LLM-Generated Peer Reviews [37.51215252353345]
The rise of large language models (LLMs) has introduced concerns that some reviewers may rely on these tools to generate reviews rather than writing them independently.<n>We consider the approach of performing indirect prompt injection via the paper's PDF, prompting the LLM to embed a covert watermark in the generated review.<n>We introduce watermarking schemes and hypothesis tests that control the family-wise error rate across multiple reviews, achieving higher statistical power than standard corrections.
arXiv Detail & Related papers (2025-03-20T01:11:35Z)
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z)
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks. This study focuses on the topic of LLMs assist NLP Researchers. To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z)
Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions [77.66677127535222]
Auto-Arena is an innovative framework that automates the entire evaluation process using LLM-powered agents. In our experiments, Auto-Arena shows a 92.14% correlation with human preferences, surpassing all previous expert-annotated benchmarks.
arXiv Detail & Related papers (2024-05-30T17:19:19Z)
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? [84.36332588191623]
We propose a novel group discussion framework to enrich the set of discussion mechanisms. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt.
arXiv Detail & Related papers (2024-02-28T12:04:05Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.