Fugu-MT 論文翻訳(概要): User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

論文の概要: User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

arxiv url: http://arxiv.org/abs/2605.12657v1
Date: Tue, 12 May 2026 19:05:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.631047
Title: User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models
Title（参考訳）: ユーザレビューによるユーザビリティ要件のソース:大規模言語モデルの利用に関する事前調査
Authors: Cedric Wellhausen, Laura Reinhardt, Kurt Schneider,
Abstract要約: 本稿では,3種類のアプリのユーザビリティ関連側面を含む300のユーザレビューのデータセットを提供する。また、ユーザレビューの理解におけるLLMのパフォーマンスが、人間のレーダのパフォーマンスに匹敵するかどうかも分析する。
参考スコア（独自算出の注目度）: 1.6498033620778052
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is known that user-centered approaches to requirements engineering in general lead to a better suited product for the end-users. LLM4RE provides promising approaches to support the requirements elicitation process (e.g. classification of requirements). Previous approaches focus on Machine-Learning (ML) or Deep-Learning (DL) aspects, which require intensive training with a large amount of manually labeled data. LLMs, on the other hand, are pre-trained on large amounts of user-generated text data, enabling a user-centric workflow to analyze requirements. In this paper, we explore the possibility of exploiting the improved natural language understanding of LLMs, rather than strict ML classification, together with the mass extraction of user reviews to analyze if the performance of LLMs in understanding user reviews is comparable to the performance of human raters. This enables a quick and cheap workflow for development teams to gather and process their userś requirements. This paper provides three major contributions: (1) We provide a completely coded dataset of 300 user reviews containing usability-relevant aspects from three different types of apps, that were labeled by two human raters and by an LLM. (2) We build an initial prompt, based on two prompt engineering iterations and specifically developed coding guidelines derived from the 10 Nielsen Usability Heuristics, for LLMs to filter usability relevant user reviews. (3) We determine that LLMs are generally able to recognize usability as a non-functional requirement in user reviews, in terms of their F-score, but the performance and reliability is strongly dependent on the prompt.
Abstract（参考訳）: 要件エンジニアリングに対するユーザ中心のアプローチが、エンドユーザーにとってより適した製品に結びつくことが知られている。 LLM4REは要件適用プロセス(例えば要求の分類)をサポートするための有望なアプローチを提供する。これまではML(Machine-Learning)やDL(Deep-Learning)に重点を置いていた。一方、LLMは大量のユーザ生成テキストデータに基づいて事前トレーニングされており、ユーザ中心のワークフローで要求を分析することができる。本稿では、厳密なML分類ではなく、LLMの自然言語理解の改善を活用できる可能性を検討するとともに、ユーザレビューの大量抽出を行い、ユーザレビューの理解におけるLLMのパフォーマンスが人間のレーダのパフォーマンスに匹敵するかどうかを分析する。これにより、開発チームがユーザ要求を収集して処理するための、迅速で安価なワークフローが可能になる。本論文は3つの主要なコントリビューションを提供する: 1) 3種類のアプリからユーザビリティ関連アスペクトを含む300のユーザレビューの完全なコード化されたデータセットを提供する。 2) 実用性に関するユーザレビューをフィルタリングするために,2つの迅速なエンジニアリングイテレーションに基づいて初期プロンプトを構築し,特に10個のNielsen Usability Heuristicsから派生したコーディングガイドラインを構築した。 (3) LLM はユーザレビューにおいてユーザビリティを非機能要件として認識できるが,性能と信頼性はプロンプトに強く依存している。

論文の概要: User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

関連論文リスト