Fugu-MT 論文翻訳(概要): (1D) Ordered Tokens Enable Efficient Test-Time Search

論文の概要: (1D) Ordered Tokens Enable Efficient Test-Time Search

arxiv url: http://arxiv.org/abs/2604.15453v1
Date: Thu, 16 Apr 2026 18:13:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.61553
Title: (1D) Ordered Tokens Enable Efficient Test-Time Search
Title（参考訳）: (1D) 効率的なテストタイム検索を可能にする注文トークン
Authors: Zhitong Gao, Parham Rezaei, Ali Cy, Mingqiao Ye, Nataša Jovanović, Jesse Allardice, Afshin Dehghan, Amir Zamir, Roman Bachmann, Oğuzhan Fatih Kar,
Abstract要約: トークン化は自己回帰(AR)生成モデルの鍵となる要素である。トークン構造がテストタイムサーチによって生成を操る能力に影響を及ぼすかどうかを検討する。
参考スコア（独自算出の注目度）: 17.29070569167214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tokenization is a key component of autoregressive (AR) generative models, converting raw data into more manageable units for modeling. Commonly, tokens describe local information, such as regions of pixels in images or word pieces in text, and AR generation predicts these tokens in a fixed order. A worthwhile question is whether token structures affect the ability to steer the generation through test-time search, where multiple candidate generations are explored and evaluated by a verifier. Using image generation as our testbed, we hypothesize that recent 1D ordered tokenizers with coarse-to-fine structure can be more amenable to search than classical 2D grid structures. This is rooted in the fact that the intermediate states in coarse-to-fine sequences carry semantic meaning that verifiers can reliably evaluate, enabling effective steering during generation. Through controlled experiments, we find that AR models trained on coarse-to-fine ordered tokens exhibit improved test-time scaling behavior compared to grid-based counterparts. Moreover, we demonstrate that, thanks to the ordered structure, pure test-time search over token sequences (i.e., without training an AR model) can perform training-free text-to-image generation when guided by an image-text verifier. Beyond this, we systematically study how classical search algorithms (best-of-N, beam search, lookahead search) interact with different token structures, as well as the role of different verifiers and AR priors. Our results highlight the impact of token structure on inference-time scalability and provide practical guidance for test-time scaling in AR models.
Abstract（参考訳）: トークン化は自己回帰(AR)生成モデルの鍵となる要素であり、生データをより管理しやすい単位に変換してモデリングする。一般的に、トークンは画像中のピクセルの領域やテキスト内のワードピースなどのローカル情報を記述し、AR生成はこれらのトークンを一定の順序で予測する。有意義な疑問は、トークン構造が、複数の候補世代を検証器で探索し評価するテストタイムサーチによって、生成を操る能力に影響を及ぼすかどうかである。画像生成をテストベッドとして使用することにより,近年の1次元秩序付きトークン化器は従来の2次元グリッド構造よりも検索しやすくなっている,という仮説を立てる。これは、粗い列から細い列の中間状態が意味を持ち、検証者が確実に評価でき、生成時に効果的な操舵を可能にするという事実に根ざしている。制御された実験により、粗大な順序付きトークンで訓練されたARモデルは、グリッドベースのトークンと比較してテスト時間スケーリングの挙動が改善されていることがわかった。さらに、順序付き構造のおかげで、画像テキスト検証器によってガイドされた場合、純粋なテストタイムのトークンシーケンス(つまりARモデルをトレーニングせずに)が、トレーニング不要のテキスト・ツー・イメージ生成を実行可能であることを示す。さらに,従来の検索アルゴリズム(Best-of-N, beam search, lookahead search)が異なるトークン構造とどのように相互作用するか,また異なる検証器やARプリエントの役割を体系的に検討する。本結果は,トークン構造が推論時スケーラビリティに与える影響を強調し,ARモデルにおけるテスト時スケーリングの実践的ガイダンスを提供する。

論文の概要: (1D) Ordered Tokens Enable Efficient Test-Time Search

関連論文リスト