Fugu-MT 論文翻訳(概要): Tiny Inference-Time Scaling with Latent Verifiers

論文の概要: Tiny Inference-Time Scaling with Latent Verifiers

arxiv url: http://arxiv.org/abs/2603.22492v2
Date: Wed, 25 Mar 2026 08:40:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 14:25:25.990217
Title: Tiny Inference-Time Scaling with Latent Verifiers
Title（参考訳）: 遅延検証器を用いたTiny Inference-Time Scaling
Authors: Davide Bucciarelli, Evelyn Turri, Lorenzo Baraldi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara,
Abstract要約: Verifier on Hidden States (VHS) は、Diffusion Transformer (DiT) の中間的な隠れ表現で動作する。 VHSは、画素空間に復号することなくジェネレータ機能を解析することにより、候補毎の検証コストを削減できる。 VHSは同じ推論時予算でGenEvalを+2.7%改善する。
参考スコア（独自算出の注目度）: 56.696619768584675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inference-time scaling has emerged as an effective way to improve generative models at test time by using a verifier to score and select candidate outputs. A common choice is to employ Multimodal Large Language Models (MLLMs) as verifiers, which can improve performance but introduce substantial inference-time cost. Indeed, diffusion pipelines operate in an autoencoder latent space to reduce computation, yet MLLM verifiers still require decoding candidates to pixel space and re-encoding them into the visual embedding space, leading to redundant and costly operations. In this work, we propose Verifier on Hidden States (VHS), a verifier that operates directly on intermediate hidden representations of Diffusion Transformer (DiT) single-step generators. VHS analyzes generator features without decoding to pixel space, thereby reducing the per-candidate verification cost while improving or matching the performance of MLLM-based competitors. We show that, under tiny inference budgets with only a small number of candidates per prompt, VHS enables more efficient inference-time scaling reducing joint generation-and-verification time by 63.3%, compute FLOPs by 51% and VRAM usage by 14.5% with respect to a standard MLLM verifier, achieving a +2.7% improvement on GenEval at the same inference-time budget.
Abstract（参考訳）: 推定時間スケーリングは、検証器を用いて候補出力をスコアし、選択することにより、テスト時の生成モデルを改善する効果的な方法として登場した。一般的な選択肢はマルチモーダル大言語モデル (MLLM) を検証器として採用することであり、性能は向上するが、かなりの推論時間コストを導入することができる。実際、拡散パイプラインは計算量を減らすためにオートエンコーダ潜在空間で動作するが、MLLM検証器はデコード候補をピクセル空間に配置し、それらを視覚的な埋め込み空間に再エンコードする必要があるため、冗長でコストがかかる。本研究では,Diffusion Transformer (DiT) の中間隠れ表現を直接操作する検証器であるVerifier on Hidden States (VHS)を提案する。 VHSは、画素空間にデコードすることなくジェネレータ機能を解析し、MLLMベースの競合他社のパフォーマンスを改善したり、適合させたりしながら、候補ごとの検証コストを削減する。提案手法では,1プロンプト当たりの候補数が少ない小さな推論予算の下では,より効率的な推論時間スケーリングを実現し,共同生成検証時間を63.3%,FLOPを51%,VRAMを14.5%削減し,同じ推論時間予算でGenEvalを+2.7%改善した。

論文の概要: Tiny Inference-Time Scaling with Latent Verifiers

関連論文リスト