Fugu-MT 論文翻訳(概要): AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

論文の概要: AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

arxiv url: http://arxiv.org/abs/2602.02079v1
Date: Mon, 02 Feb 2026 13:24:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:34.164074
Title: AICD Bench: A Challenging Benchmark for AI-Generated Code Detection
Title（参考訳）: AICD Bench: AI生成コード検出のためのベンチマーク
Authors: Daniil Orel, Dilshod Azizov, Indraneil Paul, Yuxia Wang, Iryna Gurevych, Preslav Nakov,
Abstract要約: AICD Benchは、AI生成コード検出の最も包括的なベンチマークである。これは$emph2Mサンプル$、$emph77モデル$、$emph11ファミリー$、$emph9プログラミング言語$で、最近の推論モデルを含む。
参考スコア（独自算出の注目度）: 91.21422299346199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly capable of generating functional source code, raising concerns about authorship, accountability, and security. While detecting AI-generated code is critical, existing datasets and benchmarks are narrow, typically limited to binary human-machine classification under in-distribution settings. To bridge this gap, we introduce $\emph{AICD Bench}$, the most comprehensive benchmark for AI-generated code detection. It spans $\emph{2M examples}$, $\emph{77 models}$ across $\emph{11 families}$, and $\emph{9 programming languages}$, including recent reasoning models. Beyond scale, AICD Bench introduces three realistic detection tasks: ($\emph{i}$)~$\emph{Robust Binary Classification}$ under distribution shifts in language and domain, ($\emph{ii}$)~$\emph{Model Family Attribution}$, grouping generators by architectural lineage, and ($\emph{iii}$)~$\emph{Fine-Grained Human-Machine Classification}$ across human, machine, hybrid, and adversarial code. Extensive evaluation on neural and classical detectors shows that performance remains far below practical usability, particularly under distribution shift and for hybrid or adversarial code. We release AICD Bench as a $\emph{unified, challenging evaluation suite}$ to drive the next generation of robust approaches for AI-generated code detection. The data and the code are available at https://huggingface.co/AICD-bench}.
Abstract（参考訳）: 大規模言語モデル(LLM)は、機能的なソースコードを生成する能力が増し、オーサシップ、説明責任、セキュリティに対する懸念が高まっている。 AI生成コードの検出は重要だが、既存のデータセットとベンチマークは狭い。このギャップを埋めるために、AI生成コード検出の最も包括的なベンチマークである$\emph{AICD Bench}$を紹介します。これは$\emph{2M examples}$, $\emph{77 models}$ across $\emph{11 family}$, $\emph{9 languages}$にまたがる。スケール以外にも、AICD Bench氏は、3つの現実的な検出タスクを紹介している。 (\emph{i}$)~$\emph{Robust Binary Classification}$ 言語とドメインの分散シフトの下での$ (\emph{ii}$)~$\emph{Model Family Attribution}$ アーキテクチャの系統によるジェネレータのグループ化と$\emph{iii}$)~$\emph{Fine-Grained Human-Machine Classification}$ 人、機械、ハイブリッド、敵のコードにわたって$ ニューラル検出器と古典検出器の広範囲な評価は、特に分布シフトやハイブリッドコードや逆コードにおいて、実用的ユーザビリティよりもはるかに低い性能を保っていることを示している。 AICD Benchを$\emph{unified, challenge evaluation suite}$としてリリースし、AI生成コード検出のための次世代の堅牢なアプローチを推進します。データとコードはhttps://huggingface.co/AICD-bench}で公開されている。

論文の概要: AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

関連論文リスト