Fugu-MT 論文翻訳(概要): Instance-Optimal Estimation with Multiple LLM Judges on a Budget

論文の概要: Instance-Optimal Estimation with Multiple LLM Judges on a Budget

arxiv url: http://arxiv.org/abs/2605.23362v1
Date: Fri, 22 May 2026 08:26:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.26238
Title: Instance-Optimal Estimation with Multiple LLM Judges on a Budget
Title（参考訳）: 予算上の複数のLCM判定器を用いたインスタンス最適推定
Authors: Junghyun Lee, Sanghwa Kim, Yassir Jedra, Alexandre Proutière, Se-Young Yun,
Abstract要約: 我々は、この問題を*予算付きヘテロスケダティックなマルチジャッジ推定*として定式化する。 K$のプロンプト-レスポンスペア、J$の既知のコストと未知のクエリ-ジャッジ分散が与えられた場合、目標は、$ell_p$-errorを最小化しながら、有界スコアベクトルを推定することである。 EST-IVWEは,予算の低次項までのオラクルIVWEレートと一致していることを示す。
参考スコア（独自算出の注目度）: 84.31744861038106
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can vary substantially. This raises a basic allocation question: under a fixed budget, how should one distribute evaluation queries across heterogeneous judges and instances to obtain the most accurate score estimates? We formalize this question as *budgeted heteroskedastic multi-judge estimation*. Given $K$ prompt-response pairs, $J$ judges with known costs, and unknown query-judge variances, the goal is to estimate a bounded score vector while minimizing an $\ell_p$-error. Our first contribution is to analyze the inverse-variance weighted estimator (IVWE) and to derive the oracle allocation that minimizes its error rate. Since this allocation depends on the unknown variances, we then address the practical unknown-variance setting by proposing EST-IVWE, an adaptive algorithm that constructs and leverages *optimistically biased* variance estimates to stabilize the empirical allocation. We prove that EST-IVWE matches the oracle IVWE rate up to lower-order terms in the budget. Our second and central theoretical contribution is a matching *local* minimax lower bound, which establishes the instance-optimality of the proposed algorithms. A key technical insight is that Fano-type high-probability arguments are too coarse for this problem: their packing construction loses the local variance structure that governs the optimal allocation. We instead use an Assouad-type in-expectation argument, based on local perturbations, which preserves this structure and yields the sharp allocation-dependent lower bound. Finally, we numerically validate the superiority of our approach over naïve uniform allocation on synthetic and HelpSteer2 datasets.
Abstract（参考訳）: 大規模言語モデルの評価はLSM-as-a-judgeプロトコルにますます依存しているが、このような評価はコストがかかる。固定予算の下で、不均一な審査員やインスタンスに評価クエリを分散して、最も正確なスコア推定値を得るには、どうすればよいのか? 我々は、この問題を*予算付きヘテロスケダティックなマルチジャッジ推定*として定式化する。 K$のプロンプト-レスポンスペア、J$の既知のコストと未知のクエリ-ジャッジ分散が与えられた場合、目標は、$\ell_p$-errorを最小化しながら、有界スコアベクトルを推定することである。我々の最初の貢献は、逆分散重み付き推定器(IVWE)を分析し、誤り率を最小化するオラクル割り当てを導出することである。このアロケーションは未知の分散に依存するため、経験的アロケーションを安定化させるために*最適バイアス*のアロケーション推定を構築し、活用する適応アルゴリズムであるEST-IVWEを提案することで、実用的な未知の分散設定に対処する。 EST-IVWEは,予算の低次項までのオラクルIVWEレートと一致していることを示す。 2つ目の理論的寄与は*local* minimax lower bound であり、提案アルゴリズムのインスタンス最適性を確立する。重要な技術的洞察は、ファノ型の高確率論証がこの問題には大きすぎることであり、それらのパッキング構造は最適な割り当てを管理する局所的な分散構造を失う。代わりに、局所摂動に基づくアソアッド型非探索的議論を用い、この構造を保ち、鋭いアロケーション依存の下界が得られる。最後に、合成およびHelpSteer2データセット上でのネーブ均一割り当てに対するアプローチの優位性を数値的に検証する。

論文の概要: Instance-Optimal Estimation with Multiple LLM Judges on a Budget

関連論文リスト