Fugu-MT 論文翻訳(概要): Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations

論文の概要: Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations

arxiv url: http://arxiv.org/abs/2604.14172v1
Date: Wed, 25 Mar 2026 07:32:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-19 19:09:11.697773
Title: Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations
Title（参考訳）: 10年以内のTug-of-War:教師誘導型検索世代による脆弱性分析における紛争解決
Authors: Ziyin Zhou, Jianyi Zhang, Xu ji, Yilong Li, Jiameng Han, Zhangchi Zhao,
Abstract要約: 大規模言語モデル(LLM)は、サイバーセキュリティの脆弱性を分析し、対処するために不可欠である。本稿では,CVE(Common Vulnerabilities and Exposures)の検出と分析において,知識の相違と矛盾の問題に焦点をあてる。この問題に対処するために,CRVA-TGRAGと呼ばれる革新的な2段階のフレームワークを提案する。
参考スコア（独自算出の注目度）: 11.29615928080523
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are essential for analyzing and addressing vulnerabilities in cybersecurity. However, among over 200,000 vulnerabilities were discovered in the past decade, more than 30,000 have been changed or updated. This necessitates frequent updates to the training datasets and internal knowledge bases of LLMs to maintain knowledge consistency. In this paper, we focus on the problem of knowledge discrepancy and conflict within CVE (Common Vulnerabilities and Exposures) detection and analysis. This problem hinders LLMs' ability to retrieve the latest knowledge from original training datasets, leading to knowledge conflicts, fabrications of factually incorrect results, and generation hallucinations. To address this problem, we propose an innovative two-stage framework called CRVA-TGRAG (Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generation). First, to improve document retrieval accuracy during the retrieval stage, we utilize Parent Document Segmentation and an ensemble retrieval scheme based on semantic similarity and inverted indexing. Second, to enhance LLMs' capabilities based on the retrieval of CVE dataset in generation stage, we employ a teacher-guided preference optimization technique to fine-tune LLMs. Our framework not only enhances the quality of content retrieval through RAG but also leverages the advantages of preference fine-tuning in LLMs to answer questions more effectively and precisely. Experiments demonstrate our method achieves higher accuracy in retrieving the latest CVEs compared to external knowledge bases. In conclusion, our framework significantly mitigates potential knowledge conflicts and inconsistencies that may arise from relying solely on LLMs for knowledge retrieval.
Abstract（参考訳）: 大規模言語モデル(LLM)は、サイバーセキュリティの脆弱性を分析し、対処するために不可欠である。しかし、過去10年間に20万以上の脆弱性が発見され、3万人以上が変更または更新された。これにより、LLMのトレーニングデータセットと内部知識ベースを頻繁に更新し、知識の整合性を維持する必要がある。本稿では,CVE(Common Vulnerabilities and Exposures)の検出と分析において,知識の相違と矛盾の問題に焦点をあてる。この問題は、LLMが元のトレーニングデータセットから最新の知識を回収する能力を妨げ、知識の衝突、事実的に誤った結果の生成、そして生成幻覚を引き起こす。この問題に対処するため,CRVA-TGRAG(教師ガイドによる脆弱性解析における紛争解決)という,革新的な2段階のフレームワークを提案する。まず、検索段階における文書検索精度を向上させるために、意味的類似性や逆インデックス化に基づくペアレント文書セグメンテーションとアンサンブル検索方式を用いる。第2に、世代別CVEデータセットの検索に基づくLLMの能力向上のために、教師誘導の選好最適化技術を用いて微調整LSMを提案する。我々のフレームワークは、RAGによるコンテンツ検索の質を高めるだけでなく、LLMにおける選好微調整の利点を活用して、より効果的かつ正確に質問に答える。実験により, 最新のCVEの検索精度は, 外部知識ベースと比較して高いことがわかった。結論として,本フレームワークは,LLMのみに頼って知識検索を行うことによって生じる潜在的な知識の矛盾や不整合を著しく軽減する。

論文の概要: Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations

関連論文リスト