Fugu-MT 論文翻訳(概要): RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation

論文の概要: RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation

arxiv url: http://arxiv.org/abs/2601.04196v1
Date: Wed, 03 Dec 2025 07:42:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-25 16:54:51.508286
Title: RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation
Title（参考訳）: RAGVUE:Retrieval-Augmented Generationの説明可能な自動評価のための診断的視点
Authors: Keerthana Murugaraj, Salima Lamsiyah, Martin Theobald,
Abstract要約: RAGVUEはRetrieval-Augmented Generation (RAG)システムを評価するためのフレームワークである。 RAGの振る舞いを検索品質、回答の妥当性と完全性、厳格なクレームレベルの忠実さ、および判断の校正に分解する。 RAGVUEは手動のメートル法選択と完全に自動化されたエージェント評価の両方をサポートしている。
参考スコア（独自算出の注目度）: 1.564663326217051
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Evaluating Retrieval-Augmented Generation (RAG) systems remains a challenging task: existing metrics often collapse heterogeneous behaviors into single scores and provide little insight into whether errors arise from retrieval,reasoning, or grounding. In this paper, we introduce RAGVUE, a diagnostic and explainable framework for automated, reference-free evaluation of RAG pipelines. RAGVUE decomposes RAG behavior into retrieval quality, answer relevance and completeness, strict claim-level faithfulness, and judge calibration. Each metric includes a structured explanation, making the evaluation process transparent. Our framework supports both manual metric selection and fully automated agentic evaluation. It also provides a Python API, CLI, and a local Streamlit interface for interactive usage. In comparative experiments, RAGVUE surfaces fine-grained failures that existing tools such as RAGAS often overlook. We showcase the full RAGVUE workflow and illustrate how it can be integrated into research pipelines and practical RAG development. The source code and detailed instructions on usage are publicly available on GitHub
Abstract（参考訳）: 既存のメトリクスは、不均一な振る舞いを単一のスコアに分解することが多く、エラーが検索、推論、グラウンドリングから発生するかどうかについての洞察はほとんど得られない。本稿では,RAGパイプラインの自動参照フリー評価のための診断・説明可能なフレームワークであるRAGVUEを紹介する。 RAGVUEは、RAGの振る舞いを、検索品質、回答の妥当性と完全性、厳格なクレームレベルの忠実さ、および判断キャリブレーションに分解する。各メトリクスは構造化された説明を含んでおり、評価プロセスは透過的である。我々のフレームワークは、手動の計量選択と完全に自動化されたエージェント評価の両方をサポートしている。また、インタラクティブな使用のためのPython API、CLI、ローカルのStreamlitインターフェースも提供する。比較実験では、RAGVUEはRAGASのような既存のツールがしばしば見落としている、きめ細かい欠陥を表面化する。 RAGVUEの完全なワークフローを紹介し、研究パイプラインと実践的なRAG開発にどのように統合できるかを説明します。ソースコードと使用に関する詳細な説明はGitHubで公開されている。

論文の概要: RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation

関連論文リスト