Fugu-MT 論文翻訳(概要): Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All

論文の概要: Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All

arxiv url: http://arxiv.org/abs/2602.02690v1
Date: Mon, 02 Feb 2026 19:06:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.020899
Title: Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All
Title（参考訳）: 実行中のLLMカットオフ:すべてのカーネルクラッシュ解決ベンチマーク
Authors: Chenxi Huang, Alex Mathai, Feiyang Yu, Aleksandr Nogikh, Petros Maniatis, Franjo Ivančić, Eugene Wu, Kostis Kaffes, Junfeng Yang, Baishakhi Ray,
Abstract要約: Live-kBenchは、新たに発見されたカーネルバグのエージェントをスクラップし、評価するセルフ進化ベンチマークの評価フレームワークである。 kEnvは、カーネルのコンパイル、実行、フィードバックのためのエージェントに依存しないクラッシュ解決環境である。 kEnvを用いて3つの最先端エージェントをベンチマークし、最初の試行で74%のクラッシュを解決したことを示す。
参考スコア（独自算出の注目度）: 57.23434868678603
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Repairing system crashes discovered by kernel fuzzers like Syzkaller is a critical yet underexplored challenge in software engineering. While recent works have introduced Large Language Model (LLM) based agents for Linux kernel crash-resolution, their evaluation benchmarks are usually static and thus, do not capture the evolving nature of the Linux kernel, and suffer from potential data contamination due to LLM knowledge cutoffs. To address the above problem, we present (i) Live-kBench, an evaluation framework for self-evolving benchmarks that continuously scrapes and evaluates agents on freshly discovered kernel bugs, and (ii) kEnv, an agent-agnostic standardized crash-resolution environment for kernel compilation, execution, and feedback. This design decouples agent workflows from heavy-weight execution, enabling fair and scalable comparison across diverse agent frameworks under identical conditions. To this end, we curate an inaugural dataset of 534 Linux kernel bugs and empirically demonstrate a significant performance gap, with agents achieving up to 25% higher equivalent patch rate on bugs fixed before the LLM knowledge cutoff. Using kEnv, we benchmark three state-of-the-art agents, showing that they resolve 74% of crashes on the first attempt (plausible patches); however only ~20% of generated patches closely match developer fixes. Additionally, exposing crash resolution feedback improves crash resolution rate by 29%. Live-kBench provides the community with an evaluation infrastructure for self-evolving benchmarks that is both time and attribute sensitive; complete with a public dashboard to track agent progress on Linux kernel bugs.
Abstract（参考訳）: Syzkaller氏のようなカーネルファジィザが発見したシステムクラッシュの修復は、ソフトウェアエンジニアリングにおいて批判的だが未解明の課題である。最近の研究では、Large Language Model (LLM)ベースのLinuxカーネルのクラッシュ解決エージェントが導入されたが、評価ベンチマークは通常静的であり、Linuxカーネルの進化する性質を捉えておらず、LLMの知識遮断による潜在的なデータ汚染に悩まされている。上記の問題に対処するため、我々は提示する。 (i)Live-kBenchは、新たに発見されたカーネルバグのエージェントを継続的にスクラップし、評価するセルフ進化ベンチマークの評価フレームワークである。 (ii) kEnvは、カーネルのコンパイル、実行、フィードバックのためのエージェントに依存しない標準化されたクラッシュ解決環境である。この設計はエージェントワークフローをヘビーウェイトな実行から切り離し、同じ条件下でさまざまなエージェントフレームワーク間で公正でスケーラブルな比較を可能にする。この目的のために、534のLinuxカーネルバグからなる最初のデータセットをキュレートし、LLMの知識遮断前に修正されたバグに対して、エージェントが最大25%高い同等のパッチレートを達成するという、大きなパフォーマンスギャップを経験的に示す。 kEnvを使って3つの最先端エージェントをベンチマークし、最初の試行でクラッシュの74%を解決していることを示す。さらに、クラッシュ解決フィードバックを公開することで、クラッシュ解決率が29%向上する。 Live-kBenchは、時間と属性の両方に敏感な自己進化ベンチマークのための評価インフラストラクチャをコミュニティに提供する。

論文の概要: Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All

関連論文リスト