Fugu-MT 論文翻訳(概要): Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics

論文の概要: Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics

arxiv url: http://arxiv.org/abs/2603.05832v1
Date: Fri, 06 Mar 2026 02:30:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:44.894583
Title: Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics
Title（参考訳）: Lexara: 対話型ビジュアル分析のための大規模言語モデル評価のためのユーザ中心ツールキット
Authors: Srishti Palani, Vidya Setlur,
Abstract要約: 大規模言語モデル(LLM)は、自然言語によるデータ分析を可能にすることで、会話型ビジュアル分析(CVA)を変換している。 LLMをCVAで評価することは、プログラミングの専門知識を必要とし、現実の複雑さを見落としている。本稿では,CVAのユーザ中心評価ツールキットであるLexaraを紹介する。
参考スコア（独自算出の注目度）: 15.251820893047467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiring programming expertise, overlooking real-world complexity, and lacking interpretable metrics for multi-format (visualizations and text) outputs. Through interviews with 22 CVA developers and 16 end-users, we identified use cases, evaluation criteria and workflows. We present Lexara, a user-centered evaluation toolkit for CVA that operationalizes these insights into: (i) test cases spanning real-world scenarios; (ii) interpretable metrics covering visualization quality (data fidelity, semantic alignment, functional correctness, design clarity) and language quality (factual grounding, analytical reasoning, conversational coherence) using rule-based and LLM-as-a-Judge methods; and (iii) an interactive toolkit enabling experimental setup and multi-format and multi-level exploration of results without programming expertise. We conducted a two-week diary study with six CVA developers, drawn from our initial cohort of 22. Their feedback demonstrated Lexara's effectiveness for guiding appropriate model and prompt selection.
Abstract（参考訳）: 大規模言語モデル(LLM)は、自然言語によるデータ分析を可能にすることで、会話型ビジュアル分析(CVA)を変換している。しかし、LCMs for CVAの評価は、プログラミングの専門知識を必要とすること、現実世界の複雑さを見渡すこと、マルチフォーマット(視覚化とテキスト)出力の解釈可能なメトリクスが欠けていること、など、依然として課題である。 CVA開発者22人とエンドユーザ16人へのインタビューを通じて、ユースケース、評価基準、ワークフローを特定しました。私たちはこれらの洞察を運用するCVAのためのユーザ中心評価ツールキットであるLexaraを紹介します。 (i)実世界のシナリオにまたがるテストケース二ルールベース及びLLM-as-a-Judge法による可視化品質(データの忠実性、セマンティックアライメント、機能的正当性、設計の明確性)及び言語品質(実測地、分析的推論、会話的コヒーレンス)に関する解釈可能な指標三プログラムの専門知識のない実験的なセットアップ、マルチフォーマット、マルチレベル探索を可能にする対話型ツールキット。 CVA開発者6人と2週間の日誌調査を行い,最初のコホート22。彼らのフィードバックは、適切なモデルと迅速な選択を導くレキサラの有効性を示した。

論文の概要: Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics

関連論文リスト