Fugu-MT 論文翻訳(概要): AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation

論文の概要: AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation

arxiv url: http://arxiv.org/abs/2512.16965v1
Date: Thu, 18 Dec 2025 11:16:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-22 19:25:54.130404
Title: AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation
Title（参考訳）: AutoDFBench 1.0: デジタル法医学ツールテストとコード評価生成のためのベンチマークフレームワーク
Authors: Akila Wickramasekara, Tharusha Mihiranga, Aruna Withanage, Buddhima Weerasinghe, Frank Breitinger, John Sheppard, Mark Scanlon,
Abstract要約: 本稿では,モジュール型ベンチマークフレームワークであるAutoDFBench 1.0を紹介する。これは、AI生成コードとエージェントアプローチと同様に、従来のツールとスクリプトの両方の評価をサポートする。このフレームワークは、ツールと法医学的なスクリプト間で公正かつ再現可能な比較を可能にする。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The National Institute of Standards and Technology (NIST) Computer Forensic Tool Testing (CFTT) programme has become the de facto standard for providing digital forensic tool testing and validation. However to date, no comprehensive framework exists to automate benchmarking across the diverse forensic tasks included in the programme. This gap results in inconsistent validation, challenges in comparing tools, and limited validation reproducibility. This paper introduces AutoDFBench 1.0, a modular benchmarking framework that supports the evaluation of both conventional DF tools and scripts, as well as AI-generated code and agentic approaches. The framework integrates five areas defined by the CFTT programme: string search, deleted file recovery, file carving, Windows registry recovery, and SQLite data recovery. AutoDFBench 1.0 includes ground truth data comprising of 63 test cases and 10,968 unique test scenarios, and execute evaluations through a RESTful API that produces structured JSON outputs with standardised metrics, including precision, recall, and F1~score for each test case, and the average of these F1~scores becomes the AutoDFBench Score. The benchmarking framework is validated against CFTT's datasets. The framework enables fair and reproducible comparison across tools and forensic scripts, establishing the first unified, automated, and extensible benchmarking framework for digital forensic tool testing and validation. AutoDFBench 1.0 supports tool vendors, researchers, practitioners, and standardisation bodies by facilitating transparent, reproducible, and comparable assessments of DF technologies.
Abstract（参考訳）: National Institute of Standards and Technology (NIST) Computer Forensic Tool Testing (CFTT)プログラムは、デジタル法医学ツールのテストと検証を提供するデファクトスタンダードとなっている。しかし、現在、プログラムに含まれる様々な法医学的タスクにわたるベンチマークを自動化する包括的なフレームワークは存在しない。このギャップは、一貫性のない検証、ツール比較の課題、限定された検証再現性をもたらす。本稿では,従来のDFツールとスクリプト,AI生成コードとエージェントアプローチの両方の評価をサポートするモジュール型ベンチマークフレームワークであるAutoDFBench 1.0を紹介する。このフレームワークはCFTTプログラムで定義された5つの領域を統合している。文字列検索、削除されたファイルのリカバリ、ファイルの彫刻、Windowsレジストリのリカバリ、SQLiteデータリカバリである。 AutoDFBench 1.0には、63のテストケースと10,968のユニークなテストシナリオからなる地上の真理データが含まれており、各テストケースの精度、リコール、F1~スコアといった標準化されたメトリクスで構造化されたJSON出力を生成するRESTful APIを通じて評価を実行し、これらのF1~スコアの平均がAutoDFBenchスコアとなる。ベンチマークフレームワークはCFTTのデータセットに対して検証される。このフレームワークは、ツールと法定スクリプト間で公正かつ再現可能な比較を可能にし、デジタル法定ツールのテストと検証のための最初の統一的で自動化された拡張可能なベンチマークフレームワークを確立する。 AutoDFBench 1.0は、DFテクノロジの透過的で再現性があり、同等の評価を促進することで、ツールベンダ、研究者、実践者、標準化団体をサポートする。

論文の概要: AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation

関連論文リスト