Fugu-MT 論文翻訳(概要): ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation

論文の概要: ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation

arxiv url: http://arxiv.org/abs/2512.01457v1
Date: Mon, 01 Dec 2025 09:44:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-02 19:46:34.784488
Title: ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation
Title（参考訳）: ZIP-RC: 適応的・解釈可能な生成のための遅延とコストのゼロオーバーヘッド推論時間予測
Authors: Rohin Manvi, Joey Hong, Tim Seyde, Maxime Labonne, Mathias Lechner, Sergey Levine,
Abstract要約: ZIP-RCは、モデルに報酬とコストのゼロオーバーヘッド推論時間予測を持たせる適応推論手法である。 ZIP-RCは、同じまたはより低い平均コストで過半数投票よりも最大12%精度が向上する。
参考スコア（独自算出の注目度）: 57.799425838564
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, when to make multiple attempts, when to stop, and when to signal success or failure. Without this, LLMs struggle to make intelligent meta-cognition decisions. Test-time scaling methods like Best-of-N drive up cost and latency by using a fixed budget of samples regardless of the marginal benefit of each one at any point in generation, and the absence of confidence signals can mislead people, prevent appropriate escalation to better tools, and undermine trustworthiness. Learned verifiers or reward models can provide confidence estimates, but do not enable adaptive inference and add substantial cost by requiring extra models or forward passes. We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost. At every token, ZIP-RC reuses reserved or unused logits in the same forward pass as next-token prediction to output a joint distribution over final reward and remaining length -- no extra models, architecture change, or inference overhead. This full joint distribution is used to compute a sampling utility which is the linear combination of the expected maximum reward, total compute, and latency of set of samples if generated to completion. During inference, we maximize this utility with meta-actions that determine which prefix of tokens to continue or initiate sampling from. On mixed-difficulty mathematical benchmarks, ZIP-RC improves accuracy by up to 12% over majority voting at equal or lower average cost, and traces smooth Pareto frontiers between quality, compute, and latency. By providing real-time reward-cost introspection, ZIP-RC enables adaptive, efficient reasoning.
Abstract（参考訳）: 大規模な言語モデルは推論において優れているが、独自の成功とそれを達成するために必要な計算など、イントロスペクションの重要な側面を欠いている。人間は、どれだけの労力を投資するか、いつ複数の試みをするか、いつ停止するか、いつ成功または失敗を示すかを決定するために、リアルタイムのイントロスペクションを使用する。これなしでは、LLMは知的メタ認知決定に苦労する。 Best-of-Nのようなテストタイムのスケーリング手法は、世代毎の限界的なメリットに関わらず、サンプルの固定予算を使用することで、コストとレイテンシを上昇させ、信頼のシグナルがないことは、人々を誤解させ、より良いツールへの適切なエスカレーションを防ぎ、信頼性を損なう。学習された検証者や報奨モデルは信頼度を見積もることができるが、適応推論を可能とせず、追加のモデルや前方通過を要求することでかなりのコストがかかる。 ZIP-RCは、モデルに報酬とコストのゼロオーバーヘッド推論時間予測を持たせる適応推論手法である。すべてのトークンにおいて、ZIP-RCはリザーブドまたは未使用のロジットを前もって再利用し、最終報酬と残り期間のジョイントディストリビューションを出力します。この全関節分布は、期待される最大報酬、総計算、完了前に生成されたサンプルセットのレイテンシの線形結合であるサンプリングユーティリティの計算に使用される。推論中、トークンのどのプレフィックスを継続するか、あるいはサンプリングを開始するかを決定するメタアクションでこのユーティリティを最大化する。混合微分数学ベンチマークでは、ZIP-RCは、同じまたは低い平均コストで多数決よりも最大12%精度を向上し、品質、計算、レイテンシ間のスムーズなParetoフロンティアをトレースする。リアルタイムの報酬コストイントロスペクションを提供することで、ZIP-RCは適応的で効率的な推論を可能にする。

論文の概要: ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation

関連論文リスト