Fugu-MT 論文翻訳(概要): Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs

論文の概要: Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs

arxiv url: http://arxiv.org/abs/2509.14626v1
Date: Thu, 18 Sep 2025 05:10:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.07482
Title: Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs
Title（参考訳）: ディープラーニングライブラリAPIのテストにおけるカバーガイドファズリングの有効性の評価
Authors: Feiran Qin, M. M. Abid Naziri, Hengyu Ai, Saikat Dutta, Marcelo d'Amorim,
Abstract要約: 我々は、テンプレート、ヘルパー関数、APIドキュメントを組み合わせることで、APIレベルのハーネスを自動的に合成するFlashFuzzを提案する。最先端のファジィ法と比較すると、FlashFuzzは101.13から212.88パーセントのカバレッジと1.0xから5.4倍の妥当性を実現している。本研究は,CGFがディープラーニングライブラリに効果的に適用可能であることを確認し,今後のテストアプローチの強力なベースラインを提供する。
参考スコア（独自算出の注目度）: 3.491101173753068
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Learning (DL) libraries such as PyTorch provide the core components to build major AI-enabled applications. Finding bugs in these libraries is important and challenging. Prior approaches have tackled this by performing either API-level fuzzing or model-level fuzzing, but they do not use coverage guidance, which limits their effectiveness and efficiency. This raises an intriguing question: can coverage guided fuzzing (CGF), in particular frameworks like LibFuzzer, be effectively applied to DL libraries, and does it offer meaningful improvements in code coverage, bug detection, and scalability compared to prior methods? We present the first in-depth study to answer this question. A key challenge in applying CGF to DL libraries is the need to create a test harness for each API that can transform byte-level fuzzer inputs into valid API inputs. To address this, we propose FlashFuzz, a technique that leverages Large Language Models (LLMs) to automatically synthesize API-level harnesses by combining templates, helper functions, and API documentation. FlashFuzz uses a feedback driven strategy to iteratively synthesize and repair harnesses. With this approach, FlashFuzz synthesizes harnesses for 1,151 PyTorch and 662 TensorFlow APIs. Compared to state-of-the-art fuzzing methods (ACETest, PathFinder, and TitanFuzz), FlashFuzz achieves up to 101.13 to 212.88 percent higher coverage and 1.0x to 5.4x higher validity rate, while also delivering 1x to 1182x speedups in input generation. FlashFuzz has discovered 42 previously unknown bugs in PyTorch and TensorFlow, 8 of which are already fixed. Our study confirms that CGF can be effectively applied to DL libraries and provides a strong baseline for future testing approaches.
Abstract（参考訳）: PyTorchのようなディープラーニング(DL)ライブラリは、主要なAI対応アプリケーションを構築するためのコアコンポーネントを提供する。これらのライブラリでバグを見つけることは重要で難しい。それまでのアプローチでは、APIレベルのファジングかモデルレベルのファジングのいずれかを実行することで、この問題に対処してきたが、カバレッジガイダンスは使用せず、効率と効率を制限している。 LibFuzzerのようなフレームワークは、DLライブラリに効果的に適用できますか? 私たちはこの質問に答える最初の詳細な研究を提示する。 DLライブラリにCGFを適用する上で重要な課題は、バイトレベルのファザインプットを有効なAPIインプットに変換することのできる、各API用のテストハーネスを作成する必要があることだ。これを解決するために,テンプレート,ヘルパー関数,APIドキュメントを組み合わせることで,Large Language Models(LLM)を利用してAPIレベルのハーネスを自動的に合成する技術であるFlashFuzzを提案する。 FlashFuzzはフィードバック駆動の戦略を使って、ハーネスを反復的に合成し、修復する。このアプローチにより、FlashFuzzは1,151 PyTorchと622 TensorFlow APIのハーネスを合成する。最先端ファジィ法(ACETest、PathFinder、TitanFuzz)と比較すると、FlashFuzzは101.13から212.88パーセントのカバレッジと1.0xから5.4倍の妥当性を実現し、入力生成では1xから1182倍のスピードアップを提供する。 FlashFuzzはPyTorchとTensorFlowで42の既知のバグを発見した。本研究は,CGF が DL ライブラリに効果的に適用可能であることを確認し,今後のテスト手法の強力なベースラインを提供する。

論文の概要: Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs

関連論文リスト