Fugu-MT 論文翻訳(概要): Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

論文の概要: Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

arxiv url: http://arxiv.org/abs/2601.23129v1
Date: Fri, 30 Jan 2026 16:17:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.54779
Title: Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics
Title（参考訳）: 基準自由LLMメトリクスによる接地文書の有用性評価
Authors: Yilun Hua, Giuseppe Castellucci, Peter Schulam, Heba Elfardy, Kevin Small,
Abstract要約: グラウンドイングジェネレーションユーティリティ(GroGU)は、エントロピーに基づく下流LLMの生成信頼度関数としてユーティリティを定義するモデル固有のメトリクスである。実験では、平均相反ランクで最大18.2ポイント、解答精度で最大9.4ポイントの改善が示されている。
参考スコア（独自算出の注目度）: 8.474659554619478
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Retrieval Augmented Generation (RAG)'s success depends on the utility the LLM derives from the content used for grounding. Quantifying content utility does not have a definitive specification and existing metrics ignore model-specific capabilities and/or rely on costly annotations. In this paper, we propose Grounding Generation Utility (GroGU), a model-specific and reference-free metric that defines utility as a function of the downstream LLM's generation confidence based on entropy. Despite having no annotation requirements, GroGU is largely faithful in distinguishing ground-truth documents while capturing nuances ignored by LLM-agnostic metrics. We apply GroGU to train a query-rewriter for RAG by identifying high-utility preference data for Direct Preference Optimization. Experiments show improvements by up to 18.2 points in Mean Reciprocal Rank and up to 9.4 points in answer accuracy.
Abstract（参考訳）: Retrieval Augmented Generation (RAG)の成功は、LLMがグラウンドングに使用するコンテンツから派生したユーティリティに依存する。コンテンツユーティリティの定量化は明確な仕様を持っておらず、既存のメトリクスはモデル固有の機能を無視したり、コストのかかるアノテーションに依存したりする。本稿では,モデル固有かつ参照不要な指標であるグラウンドイング生成ユーティリティ(GroGU)を,エントロピーに基づく下流LLMの生成信頼度関数として定義する。アノテーションの要求がないにもかかわらず、GroGUはLLMに依存しないメトリクスで無視されるニュアンスを捉えながら、基盤となる真実の文書を区別することに忠実である。我々はGroGUを用いて、直接選好最適化のための高ユーティリティ選好データを識別し、RAGのためのクエリ・リライターを訓練する。実験では、平均相反ランクで最大18.2ポイント、解答精度で最大9.4ポイントの改善が示されている。

論文の概要: Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

関連論文リスト