Fugu-MT 論文翻訳(概要): BASFuzz: Towards Robustness Evaluation of LLM-based NLP Software via Automated Fuzz Testing

論文の概要: BASFuzz: Towards Robustness Evaluation of LLM-based NLP Software via Automated Fuzz Testing

arxiv url: http://arxiv.org/abs/2509.17335v1
Date: Mon, 22 Sep 2025 03:13:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 14:40:40.544585
Title: BASFuzz: Towards Robustness Evaluation of LLM-based NLP Software via Automated Fuzz Testing
Title（参考訳）: BASFuzz: 自動ファズテストによるLCMベースのNLPソフトウェアのロバストネス評価
Authors: Mingxuan Xiao, Yan Xiao, Shunhui Ji, Jiahe Tu, Pengcheng Zhang,
Abstract要約: BASFuzzは、大規模言語モデル(LLM)ベースのNLPソフトウェアに適した効率的なファズテスト手法である。ビームサーチとシミュレーションアニーリングを統合したビーム-アニーリング探索アルゴリズムを用いて,効率的なファジリングループを設計する。実験では、BASFuzzは平均時間オーバーヘッドを2,163.852秒削減し、90.335%のテスト効率を達成している。
参考スコア（独自算出の注目度）: 8.893978269498524
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Fuzzing has shown great success in evaluating the robustness of intelligent natural language processing (NLP) software. As large language model (LLM)-based NLP software is widely deployed in critical industries, existing methods still face two main challenges: 1 testing methods are insufficiently coupled with the behavioral patterns of LLM-based NLP software; 2 fuzzing capability for the testing scenario of natural language generation (NLG) generally degrades. To address these issues, we propose BASFuzz, an efficient Fuzz testing method tailored for LLM-based NLP software. BASFuzz targets complete test inputs composed of prompts and examples, and uses a text consistency metric to guide mutations of the fuzzing loop, aligning with the behavioral patterns of LLM-based NLP software. A Beam-Annealing Search algorithm, which integrates beam search and simulated annealing, is employed to design an efficient fuzzing loop. In addition, information entropy-based adaptive adjustment and an elitism strategy further enhance fuzzing capability. We evaluate BASFuzz on six datasets in representative scenarios of NLG and natural language understanding (NLU). Experimental results demonstrate that BASFuzz achieves a testing effectiveness of 90.335% while reducing the average time overhead by 2,163.852 seconds compared to the current best baseline, enabling more effective robustness evaluation prior to software deployment.
Abstract（参考訳）: ファジングは、インテリジェント自然言語処理(NLP)ソフトウェアの堅牢性を評価する上で大きな成功を収めている。大規模言語モデル(LLM)ベースのNLPソフトウェアが重要な産業に広くデプロイされているため、既存の手法は依然として2つの大きな課題に直面している。これらの問題に対処するために, LLM ベースの NLP ソフトウェアに適した効率的な Fuzz テスト手法である BASFuzz を提案する。 BASFuzzはプロンプトと例で構成された完全なテスト入力をターゲットにしており、テキスト一貫性メトリクスを使用してファジングループの突然変異を誘導し、LLMベースのNLPソフトウェアの動作パターンと整合する。ビームサーチとシミュレーションアニーリングを統合したビーム-アニーリング探索アルゴリズムを用いて,効率的なファジリングループを設計する。さらに、情報エントロピーに基づく適応調整とエリート戦略によりファジリング能力がさらに向上する。我々は,NLGと自然言語理解(NLU)の代表的なシナリオにおいて,6つのデータセット上でBASFuzzを評価する。実験の結果、BASFuzzは、現在の最良のベースラインと比較して平均時間オーバーヘッドを2,163.852秒削減し、90.335%のテスト効率を達成し、ソフトウェアデプロイメントに先立ってより効果的な堅牢性評価を可能にした。

論文の概要: BASFuzz: Towards Robustness Evaluation of LLM-based NLP Software via Automated Fuzz Testing

関連論文リスト