Fugu-MT 論文翻訳(概要): Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

論文の概要: Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

arxiv url: http://arxiv.org/abs/2603.18740v1
Date: Thu, 19 Mar 2026 10:40:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.093559
Title: Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review
Title（参考訳）: LLM支援セキュリティコードレビューにおける確認バイアスの測定と爆発
Authors: Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos, Diomidis Spinellis,
Abstract要約: ソフトウェアサプライチェーン攻撃において,確認バイアスがLSMベースの脆弱性検出に影響を及ぼすか,また,この障害モードを悪用できるかを検討する。調査1では,5つのフレーミング条件下で4つの最先端モデルに対して評価された250個のCVE脆弱性/パッチペアに対する制御実験により,確認バイアスを定量化する。調査2は、既知の脆弱性を再導入する敵のプルリクエストを模倣して、セキュリティの改善やプルリクエストメタデータによる緊急機能修正を実施可能であることを評価する。
参考スコア（独自算出の注目度）: 6.417595678110472
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in CI/CD pipelines. We study whether confirmation bias (i.e., the tendency to favor interpretations that align with prior expectations) affects LLM-based vulnerability detection, and whether this failure mode can be exploited in software supply-chain attacks. We conduct two complementary studies. Study 1 quantifies confirmation bias through controlled experiments on 250 CVE vulnerability/patch pairs evaluated across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly asymmetric effects: false negatives increase sharply while false positive rates change little. Bias effects vary by vulnerability type, with injection flaws being more susceptible to them than memory corruption bugs. Study 2 evaluates exploitability in practice mimicking adversarial pull requests that reintroduce known vulnerabilities while framed as security improvements or urgent functionality fixes via their pull request metadata. Adversarial framing succeeds in 35% of cases against GitHub Copilot (interactive assistant) under one-shot attacks and in 88% of cases against Claude Code (autonomous agent) in real project configurations where adversaries can iteratively refine their framing to increase attack success. Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases. Our results show that confirmation bias poses a weakness in LLM-based code review, with implications on how AI-assisted development tools are deployed.
Abstract（参考訳）: セキュリティコードレビューは、対話型アシスタントからCI/CDパイプラインの自律エージェントまで、大規模言語モデル(LLM)を統合するシステムにますます依存している。確認バイアス(すなわち、事前の期待に沿う解釈を好む傾向)がLSMベースの脆弱性検出に影響を及ぼすか、ソフトウェアサプライチェーン攻撃でこの障害モードを利用することができるかを検討する。私たちは2つの補完的研究を行います。調査1では,5つのフレーミング条件下で4つの最先端モデルに対して評価された250個のCVE脆弱性/パッチペアに対する制御実験により,確認バイアスを定量化する。バグのない変更によって脆弱性検出率が16～93%減少し、強い非対称な効果が生じる:偽陰性は急激に増加し、偽陽性率はわずかに変化する。バイアス効果は脆弱性の種類によって異なり、インジェクションの欠陥はメモリ破損のバグよりも影響を受けやすい。調査2は、既知の脆弱性を再導入する敵のプルリクエストを模倣して、セキュリティの改善やプルリクエストメタデータによる緊急機能修正を実施可能であることを評価する。対戦型フレーミングはGitHub Copilot(対話型アシスタント)に対してワンショット攻撃で35%、Claude Code(自律エージェント)に対して88%のケースで成功している。メタデータのリアクションと明示的な指示によるデバイアスは、すべてのインタラクティブなケースと94%の自律的なケースで検出を復元する。以上の結果から,確認バイアスはLLMベースのコードレビューにおいて弱点となり,AI支援開発ツールのデプロイ方法に影響を及ぼすことが示された。

論文の概要: Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

関連論文リスト