Fugu-MT 論文翻訳(概要): Search-Induced Issues in Web-Augmented LLM Code Generation: Detecting and Repairing Error-Inducing Pages

論文の概要: Search-Induced Issues in Web-Augmented LLM Code Generation: Detecting and Repairing Error-Inducing Pages

arxiv url: http://arxiv.org/abs/2603.26091v1
Date: Fri, 27 Mar 2026 05:42:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.364808
Title: Search-Induced Issues in Web-Augmented LLM Code Generation: Detecting and Repairing Error-Inducing Pages
Title（参考訳）: Web 拡張 LLM コード生成における検索による問題:エラー発生ページの検出と修復
Authors: Guoqing Wang, Zeyu Sun, Xiaofei Xie, Yizhou Chen, Yanchao Tan, Yifan Zhao, Dan Hao,
Abstract要約: Webの拡張された大規模言語モデル(LLM)は、自動コード生成に有望な機能を提供する。 Live Web Searchは、信頼できないまたは悪意のあるコンテンツにモデルを公開し、検索誘導問題に繋がる(SII) サービスプロバイダが積極的にWeb拡張システムを保護するための自動化フレームワークであるSherlockを提案する。
参考スコア（独自算出の注目度）: 38.96179816585195
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Web-augmented large language models (LLMs) offer promising capabilities for automatic code generation. However, integrating live web search exposes models to unreliable or malicious content, leading to Search-Induced Issues (SII), a novel failure mode in which external pages mislead LLMs into producing incorrect code. This paper presents a comprehensive empirical study of the prevalence and impact of SII across three commercial search APIs and six advanced LLMs. Our analysis reveals that all evaluated web-augmented LLMs are vulnerable to SII, with root causes arising from either misaligned specifications or flawed code implementations in the searched Error-Inducing Pages (EIPs). To address this challenge, we propose Sherlock, an automated framework that enables LLM service providers to proactively safeguard web-augmented generation systems at scale. Sherlock operates as a continuous pipeline that first detects potential SII instances, then debugs them to identify the responsible EIPs and pinpoint their root causes, and finally repairs them by either annotating misaligned content or replacing erroneous code snippets with evaluated solutions from trusted sources. Experiments show that Sherlock identifies EIPs with an F1 score of up to 95% and repairs 71% to 100% of affected generations across the evaluated models, with modest computational overhead. Our findings and framework provide practical guidance for improving the reliability of web-augmented LLM-based code generation systems in real-world software engineering scenarios.
Abstract（参考訳）: Webの拡張された大規模言語モデル(LLM)は、自動コード生成に有望な機能を提供する。しかし、ライブウェブ検索の統合は、モデルを信頼できないまたは悪意のあるコンテンツに公開し、検索誘導問題(SII)に繋がる。本稿では,3つの商用検索APIと6つの先進LDMにおけるSIIの有病率と影響について,総合的研究を行った。分析の結果,Web 拡張 LLM はすべて SII に脆弱性があることが判明した。この課題に対処するために、私たちは、LLMサービスプロバイダが大規模にWeb拡張された生成システムを積極的に保護できる自動化フレームワークであるSherlockを提案する。 Sherlockは、SIIインスタンスを最初に検出し、責任あるEIPを特定して根本原因を特定できるようにデバッグする継続的パイプラインとして動作し、最後に、誤ったコンテンツに注釈を付けたり、誤ったコードスニペットを信頼できるソースから評価されたソリューションに置き換えることで、それらを修復する。実験の結果、SherlockはF1スコアの最大95%でEIPを識別し、評価されたモデル全体で影響のある世代のうち71%から100%を修復する。本研究の成果とフレームワークは,実世界のソフトウェア工学シナリオにおけるLLMベースのコード生成システムの信頼性向上のための実践的ガイダンスを提供する。

論文の概要: Search-Induced Issues in Web-Augmented LLM Code Generation: Detecting and Repairing Error-Inducing Pages

関連論文リスト