Fugu-MT 論文翻訳(概要): HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

論文の概要: HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

arxiv url: http://arxiv.org/abs/2606.12620v1
Date: Wed, 10 Jun 2026 19:21:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.429015
Title: HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection
Title（参考訳）: HybridCodeAuthorship: 行レベルのコードオーサリング検出のためのベンチマークデータセット
Authors: Luke Patterson, Li Wang, Adam Faulkner,
Abstract要約: HybridCodeAuthorshipは、人間とAIが認可したコード行をインターリーブしたPythonコードファイルの新しいベンチマークである。次に、ラインレベルとチャンクレベルの2つの最先端AI生成コード検出アルゴリズムのパフォーマンスをベンチマークする。
参考スコア（独自算出の注目度）: 1.6385993554828697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thanks to the rapid adoption of AI code assistants powered by large language models (LLMs), industry codebases are, increasingly, a hybrid of AI- and human-authored code. For risk management and productivity analysis purposes, it is crucial to enable fine-grained location detection of AI-generated code. To develop algorithms for this task, quality benchmarks are needed to assess performance. However, existing benchmarks tend to comprise academic, LeetCode-style problems and presume a code snippet is either completely human-authored or completely AI-authored, which is not reflective of the diverse intents and styles of industry codebases utilizing AI code assistants. To fill these gaps, we introduce HybridCodeAuthorship, a novel benchmark of Python code files with interleaved human- and AI-authored lines of code to simulate authentic utilization of AI code assistants. In this paper, we first present our dataset construction pipeline, which leverages CodeSearchNet, a massive collection of links to open sourced repositories on GitHub. We then benchmark the performance of two state-of-the-art AI-generated code detection algorithms at both the line- and chunk-level. Experimental results demonstrate that HybridCodeAuthorship is a challenging benchmark with a top-scoring algorithm, AIGCode Detector, obtaining a highest F1 score of 0.48 and 0.56 on chunk-level and line-level code detection tasks, respectively.
Abstract（参考訳）: 大規模言語モデル(LLM)を活用したAIコードアシスタントの急速な採用により、業界コードベースはますますAIと人間によるコードの組み合わせになりつつある。リスク管理と生産性分析の目的のためには、AI生成コードの詳細な位置検出を可能にすることが不可欠である。このタスクのアルゴリズムを開発するには、性能を評価するために品質ベンチマークが必要である。しかし、既存のベンチマークは学術的なLeetCodeスタイルの問題で構成され、コードスニペットは完全に人間によるものであるか、完全にAIによるものであると仮定する傾向にあり、AIコードアシスタントを利用した業界コードベースのさまざまな意図やスタイルを反映していない。これらのギャップを埋めるために、私たちはHybridCodeAuthorshipを紹介します。これは、AIコードアシスタントの認証利用をシミュレートする、インターリーブされた人間とAIによって認可されたコード行を持つPythonコードファイルの新しいベンチマークです。本稿では,GitHub上のオープンソースリポジトリへのリンクの大規模なコレクションであるCodeSearchNetを活用するデータセット構築パイプラインについて紹介する。次に、ラインレベルとチャンクレベルの2つの最先端AI生成コード検出アルゴリズムのパフォーマンスをベンチマークする。実験の結果、HybridCodeAuthorshipはトップスコアアルゴリズムであるAIGCode Detectorで、チャンクレベルとラインレベルのコード検出タスクでそれぞれ0.48と0.56のF1スコアを得た。

論文の概要: HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

関連論文リスト