Fugu-MT 論文翻訳(概要): Security in LLM-as-a-Judge: A Comprehensive SoK

論文の概要: Security in LLM-as-a-Judge: A Comprehensive SoK

arxiv url: http://arxiv.org/abs/2603.29403v1
Date: Tue, 31 Mar 2026 08:05:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.33829
Title: Security in LLM-as-a-Judge: A Comprehensive SoK
Title（参考訳）: LLM-as-a-Judgeのセキュリティ: 総合的なSoK
Authors: Aiman Almasoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu, Vignesh Kumar Kembu, Serena Nicolazzo, Antonino Nocera, Vinod P., Saraga Sakthidharan,
Abstract要約: 本稿では,LLM-as-a-Judgeシステムのセキュリティ面に着目した最初のSoK(Systematization of Knowledge)を提案する。本研究は,LLM-as-a-Judgeがセキュリティの現場で果たす役割に基づいて,最近の研究を組織する分類法を提案する。 LLMに基づく評価フレームワークの重大な脆弱性と,その堅牢性と信頼性を向上させるための有望な方向性を明らかにした。
参考スコア（独自算出の注目度）: 3.5168742057928246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy that organizes recent research according to the role played by LLM-as-a-Judge in the security landscape, distinguishing between attacks targeting LaaJ systems, attacks performed through LaaJ, defenses leveraging LaaJ for security purposes, and applications where LaaJ is used as an evaluation strategy in security-related domains. We further provide a comparative analysis of existing approaches, highlighting current limitations, emerging threats, and open research challenges. Our findings reveal significant vulnerabilities in LLM-based evaluation frameworks, as well as promising directions for improving their robustness and reliability. Finally, we outline key research opportunities that can guide the development of more secure and trustworthy LLM-as-a-Judge systems.
Abstract（参考訳）: LLM-as-a-Judge (LaaJ) は、出力の品質、安全性、正確性を評価するために強力な言語モデルを使用する新しいパラダイムである。このパラダイムは評価プロセスのスケーラビリティと効率を大幅に改善しましたが、新たなセキュリティリスクや信頼性に関する懸念も生まれています。特に、LSMベースの審査員は、敵の操作と攻撃を行う機器の両方の標的となり、評価パイプラインの信頼性を損なう可能性がある。本稿では,LLM-as-a-Judgeシステムのセキュリティ面に着目した最初のSoK(Systematization of Knowledge)を提案する。我々は、主要な学術データベースにわたる総合的な文献レビューを行い、853の著作を分析し、2020年から2026年の間に発行された45の関連研究を選定した。本研究では,LaJシステムに対する攻撃,LaaJによる攻撃,LaaJをセキュリティ目的に活用する防衛,セキュリティ関連領域における評価戦略としてLaaJを使用するアプリケーションとを区別し,LLM-as-a-Judgeがセキュリティ現場で果たす役割に基づいて最近の研究を組織する分類法を提案する。さらに、既存のアプローチの比較分析を行い、現在の制限、新たな脅威、オープンな研究課題を強調します。 LLMに基づく評価フレームワークの重大な脆弱性と,その堅牢性と信頼性を向上させるための有望な方向性を明らかにした。最後に、より安全で信頼性の高いLCM-as-a-Judgeシステムの開発を導くための重要な研究機会を概説する。

論文の概要: Security in LLM-as-a-Judge: A Comprehensive SoK

関連論文リスト