Fugu-MT 論文翻訳(概要): A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

論文の概要: A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

arxiv url: http://arxiv.org/abs/2508.18106v2
Date: Wed, 10 Sep 2025 07:24:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.162069
Title: A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Title（参考訳）: A.S.E:AI生成コードのセキュリティ評価のためのリポジトリレベルベンチマーク
Authors: Keke Lian, Bin Wang, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Jiazheng Quan, Yilu Zhong, Chenhao He, Zichuan Chen, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui Li, Dong Zhang,
Abstract要約: A.S.E(AI Code Generation Security Evaluation、AIコード生成セキュリティ評価)は、現実のAIプログラミングタスクを密接に反映するように設計されたリポジトリレベルの評価ベンチマークである。大規模言語モデル(LLM)をA.S.E上で評価した結果,いくつかの重要な知見が得られた。
参考スコア（独自算出の注目度）: 48.10068691540979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on snippet-level tasks. Morever, a larger reasoning budget does not necessarily lead to better code generation. These observations offer valuable insights into the current state of AI code generation, assisting developers in selecting the most appropriate models for practical tasks, while laying the foundation for refining LLMs to generate secure and efficient code in real-world applications.
Abstract（参考訳）: ソフトウェア工学における大規模言語モデル(LLM)の採用の増加は、生成されたコードの厳格なセキュリティ評価を必要とする。しかし、既存のベンチマークでは、実世界のAIプログラミングシナリオとの関連性が欠如していることが多く、プロダクション環境でAI生成コードに関連する現実的なセキュリティリスクを評価するには不十分である。 A.S.E(AI Code Generation Security Evaluation、AIコード生成セキュリティ評価)は、AI生成コードのセキュリティを評価するための包括的で信頼性の高いフレームワークを提供する、現実のAIプログラミングタスクを密接に反映したリポジトリレベルの評価ベンチマークである。 A.S.E における LLM の検討から,いくつかの重要な知見が得られた。特に、現在のLLMはセキュアなコーディングに苦戦している。リポジトリレベルのシナリオの複雑さは、典型的にはスニペットレベルのタスクでうまく機能するLLMの課題を示します。さらに、より大きな推論予算は、必ずしもより良いコード生成につながるとは限らない。これらの観察は、AIコード生成の現状に関する貴重な洞察を提供し、開発者が実用的なタスクに最も適したモデルを選択するのを支援すると同時に、LLMを精錬して、現実世界のアプリケーションでセキュアで効率的なコードを生成する基盤を構築します。

論文の概要: A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

関連論文リスト