Fugu-MT 論文翻訳(概要): A Benchmark Dataset And LLMs Comparison For NFR Classification With Explainable AI

論文の概要: A Benchmark Dataset And LLMs Comparison For NFR Classification With Explainable AI

arxiv url: http://arxiv.org/abs/2510.18096v1
Date: Mon, 20 Oct 2025 20:45:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.618137
Title: A Benchmark Dataset And LLMs Comparison For NFR Classification With Explainable AI
Title（参考訳）: ベンチマークデータセットとLLMによるNFR分類と説明可能なAIの比較
Authors: Esrat Ebtida Sakib, MD Ahnaf Akib, Md Muktadir Mazumder, Maliha Noushin Raida, Md. Mohsinul Kabir,
Abstract要約: 非Functional Requirements(NFR)は、ソフトウェアシステムの全体的な品質とユーザ満足度を決定する上で重要な役割を果たす。さまざまなプロジェクト憲章とオープンソースソフトウェアドキュメントからNFRを収集しました。我々はNFRをサブクラスに分類し、広く使われている大規模言語モデルを用いてニーズを特定した。
参考スコア（独自算出の注目度）: 0.21748200848556345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-Functional Requirements (NFRs) play a critical role in determining the overall quality and user satisfaction of software systems. Accurately identifying and classifying NFRs is essential to ensure that software meets performance, usability, and reliability expectations. However, manual identification of NFRs from documentation is time-consuming and prone to errors, necessitating automated solutions. Before implementing any automated solution, a robust and comprehensive dataset is essential. To build such a dataset, we collected NFRs from various Project Charters and Open Source Software Documentation. This enhanced the technical depth and usability of an already existing NFR dataset. We categorized NFRs into sub-classes and identified needs using widely used Large Language Models to facilitate automation. After classifying the NFRs, we compared the classification results of the selected LLMs: RoBERTa, CodeBERT, Gemma-2, Phi-3, Mistral-8B, and Llama-3.1-8B using various evaluation metrics, including precision, recall, F1-score, and lime scores. Among these models, Gemma-2 achieved the best results with a precision of 0.87, recall of 0.89, and F1-score of 0.88, alongside a lime hit score of 78 out of 80. Phi-3 closely followed with a precision of 0.85, recall of 0.87, F1-score of 0.86, and the highest lime hit score of 79. By improving the contextual foundation, this integration enhanced the model's comprehension of technical aspects and user requirements.
Abstract（参考訳）: 非Functional Requirements(NFR)は、ソフトウェアシステムの全体的な品質とユーザ満足度を決定する上で重要な役割を果たす。 NFRの正確な識別と分類は、ソフトウェアがパフォーマンス、ユーザビリティ、信頼性の期待を満たすことを保証するために不可欠です。しかしながら、ドキュメントからNFRを手動で識別することは時間がかかり、エラーを起こしやすいため、自動化されたソリューションが必要になる。自動化されたソリューションを実装する前には、堅牢で包括的なデータセットが不可欠だ。このようなデータセットを構築するために、さまざまなProject ChartersとOpen Source Software DocumentationからNFRを収集しました。これにより、既存のNFRデータセットの技術的深度とユーザビリティが向上した。我々はNFRをサブクラスに分類し、自動化を容易にするために広く使われている大規模言語モデルを用いてニーズを特定した。 NFRを分類した後、精度、リコール、F1スコア、ライムスコアなどの様々な評価指標を用いて、RoBERTa、CodeBERT、Gemma-2、Phi-3、Mistral-8B、Llama-3.1-8Bの分類結果を比較した。これらのモデルのうち、Gemma-2は精度0.87、リコール0.89、F1スコア0.88、ライムヒットスコア80点中78点で最高の成績を収めた。 Phi-3は精度0.85、リコール0.87、F1スコア0.86、最高ライムヒット79。コンテキスト基盤を改善することで、この統合はモデルの技術的な側面とユーザー要求の理解を高めました。

論文の概要: A Benchmark Dataset And LLMs Comparison For NFR Classification With Explainable AI

関連論文リスト