Fugu-MT 論文翻訳(概要): LaQual: A Novel Framework for Automated Evaluation of LLM App Quality

論文の概要: LaQual: A Novel Framework for Automated Evaluation of LLM App Quality

arxiv url: http://arxiv.org/abs/2508.18636v1
Date: Tue, 26 Aug 2025 03:25:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-27 17:42:38.65916
Title: LaQual: A Novel Framework for Automated Evaluation of LLM App Quality
Title（参考訳）: LaQual: LLMアプリケーション品質の自動評価フレームワーク
Authors: Yan Wang, Xinyi Hou, Yanjie Zhao, Weiguo Lin, Haoyu Wang, Junjun Si,
Abstract要約: LaQualはLLMアプリの品質を評価するためのフレームワークだ。 LaQualは3つの主要なステージで構成されている。まず、異なるシナリオに正確にマッチするために、LLMアプリを階層的な方法でラベル付けし分類する。人気のあるLLMアプリストアの実験では、LaQualが有効であることが示されている。
参考スコア（独自算出の注目度）: 10.124358468702031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM app stores are quickly emerging as platforms that gather a wide range of intelligent applications based on LLMs, giving users many choices for content creation, coding support, education, and more. However, the current methods for ranking and recommending apps in these stores mostly rely on static metrics like user activity and favorites, which makes it hard for users to efficiently find high-quality apps. To address these challenges, we propose LaQual, an automated framework for evaluating the quality of LLM apps. LaQual consists of three main stages: first, it labels and classifies LLM apps in a hierarchical way to accurately match them to different scenarios; second, it uses static indicators, such as time-weighted user engagement and functional capability metrics, to filter out low-quality apps; and third, it conducts a dynamic, scenario-adaptive evaluation, where the LLM itself generates scenario-specific evaluation metrics, scoring rules, and tasks for a thorough quality assessment. Experiments on a popular LLM app store show that LaQual is effective. Its automated scores are highly consistent with human judgments (with Spearman's rho of 0.62 and p=0.006 in legal consulting, and rho of 0.60 and p=0.009 in travel planning). By effectively screening, LaQual can reduce the pool of candidate LLM apps by 66.7% to 81.3%. User studies further confirm that LaQual significantly outperforms baseline systems in decision confidence, comparison efficiency (with average scores of 5.45 compared to 3.30), and the perceived value of its evaluation reports (4.75 versus 2.25). Overall, these results demonstrate that LaQual offers a scalable, objective, and user-centered solution for finding and recommending high-quality LLM apps in real-world use cases.
Abstract（参考訳）: LLMアプリストアは、LLMに基づいて幅広いインテリジェントなアプリケーションを収集するプラットフォームとして急速に発展し、コンテンツ作成やコーディングサポート、教育など多くの選択肢をユーザに与えている。しかし、これらのストアでアプリのランク付けとレコメンデーションを行う現在の方法は、主にユーザーのアクティビティやお気に入りのような静的なメトリクスに依存しているため、ユーザーが高品質なアプリを見つけるのが難しくなっている。これらの課題に対処するため、LLMアプリの品質を評価するための自動化フレームワークであるLaQualを提案する。 LaQualは3つの主要なステージで構成されている。第1に、LLMアプリを階層的な方法で分類して、異なるシナリオに正確にマッチさせる。第2に、時間重み付けされたユーザエンゲージメントや機能機能メトリクスといった静的指標を使用して、低品質のアプリをフィルタリングする。人気のあるLLMアプリストアの実験では、LaQualが有効であることが示されている。自動スコアは人間の判断と非常に一致している(法的コンサルティングにおけるスピアマンのローは0.62、p=0.006、旅行計画におけるローは0.60、p=0.009)。効果的にスクリーニングすることで、LaQual は候補の LLM アプリのプールを 66.7% から 81.3% に減らすことができる。ユーザ調査により、LaQualは意思決定の信頼性、比較効率(平均スコアが3.30に対して5.45)、評価レポートの認識値(4.75対2.25)において、ベースラインシステムを大幅に上回っていることが確認された。これらの結果は、LaQualが現実世界のユースケースで高品質なLLMアプリを見つけて推奨するためのスケーラブルで客観的でユーザ中心のソリューションを提供していることを実証している。

論文の概要: LaQual: A Novel Framework for Automated Evaluation of LLM App Quality

関連論文リスト