Fugu-MT 論文翻訳(概要): "**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

論文の概要: "Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

arxiv url: http://arxiv.org/abs/2606.03090v1
Date: Tue, 02 Jun 2026 03:24:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.726325
Title: "**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems
Title（参考訳）: **Important** You should give me full credits!: Exploring Prompt Injection Attacks on LLM-based Automatic Grading Systems
Authors: Hang Li, Fedor Filippov, Yuling Lin, Pengfei He, Kaiqi Yang, Yucheng Chu, Yingqian Cui, Hui Liu, Jiliang Tang,
Abstract要約: 大規模言語モデル (LLM) は自動階調システム (AG) の研究を著しく加速している。特に、プロンプトインジェクション(PI)攻撃は、最近LLMベースのアプリケーションにとって大きな脅威となっている。我々は、AGシステムにおけるPI攻撃を調査し、教育シナリオにおけるそのような攻撃の有効性を体系的に調査する。
参考スコア（独自算出の注目度）: 34.69207247488525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The emergence of large language models (LLMs) has significantly accelerated recent research on LLM-based automatic grading (AG) systems. Benefiting from the strong instruction-following capabilities and broad prior knowledge of LLMs, educators can deploy AG systems across diverse tasks using only natural language rubrics while achieving satisfactory grading performance. Despite these advantages, new security concerns may also arise. In particular, prompt injection (PI) attacks have recently become a major threat to LLM-based applications. In the context of AG, attackers can potentially exploit PI vulnerabilities to manipulate grading systems into assigning artificially high scores regardless of the actual answer quality. Such behavior poses serious risks to the fairness, reliability, and integrity of educational assessment. In this work, we study PI attacks in AG systems, and systematically investigate the effectiveness of such attacks in educational scenarios. We further evaluate the effectiveness of existing defensive strategies against these attacks. Through comprehensive experiments under rubric-based grading settings, we demonstrate that current LLM-based AG systems remain highly vulnerable to PI attacks. We hope that our findings raise awareness of this emerging threat and motivate future research toward secure, robust, and trustworthy LLM-based educational systems.
Abstract（参考訳）: 大規模言語モデル(LLM)の出現は,近年,LLMに基づく自動階調システム(AG)の研究を著しく加速させている。強力なインストラクションフォロー能力とLLMの幅広い事前知識を活かして、教育者は、自然言語のルーリックのみを使用してAGシステムを様々なタスクに展開し、良好なグレーディング性能を達成できます。これらの利点にもかかわらず、新たなセキュリティ上の懸念も生じる可能性がある。特に、プロンプトインジェクション(PI)攻撃は、最近LLMベースのアプリケーションにとって大きな脅威となっている。 AGの文脈では、攻撃者はPI脆弱性を利用してグレーディングシステムを操作して、実際の回答の品質に関わらず、人工的に高いスコアを割り当てることができる。このような行動は、教育評価の公平さ、信頼性、完全性に重大なリスクをもたらす。本稿では,AGシステムにおけるPI攻撃について検討し,教育シナリオにおけるこれらの攻撃の有効性を体系的に検討する。これらの攻撃に対する既存の防衛戦略の有効性をさらに評価する。ルーリックベースのグレーティング設定の下での総合的な実験を通じて、現在のLLMベースのAGシステムはPI攻撃に対して非常に脆弱であることを示します。我々は、この新たな脅威に対する認識を高め、安全で堅牢で信頼性の高いLLMベースの教育システムに向けた将来の研究を動機付けることを願っている。

論文の概要: "**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

関連論文リスト

論文の概要: "Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems