Fugu-MT 論文翻訳(概要): CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering

論文の概要: CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering

arxiv url: http://arxiv.org/abs/2604.03750v1
Date: Sat, 04 Apr 2026 14:51:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.751034
Title: CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering
Title（参考訳）: CREBench: 暗号バイナリリバースエンジニアリングにおける大規模言語モデルの評価
Authors: Baicheng Chen, Yu Wang, Ziheng Zhou, Xiangru Liu, Juanru Li, Yilei Chen, Tianxing He,
Abstract要約: 大規模言語モデル(LLM)の暗号バイナリリバースエンジニアリング機能について検討する。ベンチマークでは,48の標準暗号アルゴリズム,3つの安全でない暗号鍵使用シナリオ,3つの難易度から構築した432の課題で構成されている。また、92.19ポイントの強力な人間専門家ベースラインを確立し、暗号REタスクにおいて人間が優位を維持していることを示す。
参考スコア（独自算出の注目度）: 12.401873262343862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reverse engineering (RE) is central to software security, particularly for cryptographic programs that handle sensitive data and are highly prone to vulnerabilities. It supports critical tasks such as vulnerability discovery and malware analysis. Despite its importance, RE remains labor-intensive and requires substantial expertise, making large language models (LLMs) a potential solution for automating the process. However, their capabilities for RE remain systematically underexplored. To address this gap, we study the cryptographic binary RE capabilities of LLMs and introduce \textbf{CREBench}, a benchmark comprising 432 challenges built from 48 standard cryptographic algorithms, 3 insecure crypto key usage scenarios, and 3 difficulty levels. Each challenge follows a Capture-the-Flag (CTF) RE challenge, requiring the model to analyze the underlying cryptographic logic and recover the correct input. We design an evaluation framework comprising four sub-tasks, from algorithm identification to correct flag recovery. We evaluate eight frontier LLMs on CREBench. GPT-5.4, the best-performing model, achieves 64.03 out of 100 and recovers the flag in 59\% of challenges. We also establish a strong human expert baseline of 92.19 points, showing that humans maintain an advantage in cryptographic RE tasks. Our code and dataset are available at https://github.com/wangyu-ovo/CREBench.
Abstract（参考訳）: リバースエンジニアリング(RE)はソフトウェアセキュリティの中心であり、特に機密データを扱う暗号プログラムでは脆弱性が非常に多い。脆弱性発見やマルウェア分析といった重要なタスクをサポートする。その重要性にもかかわらず、REは労働集約的であり、相当な専門知識を必要とし、大きな言語モデル(LLM)をプロセスを自動化する潜在的ソリューションにする。しかし、REの能力は体系的に過小評価されている。このギャップに対処するために、LLMの暗号バイナリRE機能について検討し、標準暗号アルゴリズム48、安全でない暗号鍵使用シナリオ3、難易度3から構築された432の課題からなるベンチマークである \textbf{CREBench} を導入する。各チャレンジはCapture-the-Flag (CTF) REチャレンジに従い、基盤となる暗号ロジックを分析して正しい入力を復元する必要がある。我々は,アルゴリズム識別からフラグ回復までの4つのサブタスクからなる評価フレームワークを設計する。 CREBench 上で8つのフロンティア LLM の評価を行った。最高のパフォーマンスモデルであるGPT-5.4は、100点中64.03点を達成し、596%の課題でフラグを回収する。また、92.19ポイントの強力な人間専門家ベースラインを確立し、暗号REタスクにおいて人間が優位を維持していることを示す。私たちのコードとデータセットはhttps://github.com/wangyu-ovo/CREBench.comで公開されています。

論文の概要: CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering

関連論文リスト