Fugu-MT 論文翻訳(概要): Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling

論文の概要: Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling

arxiv url: http://arxiv.org/abs/2510.17171v1
Date: Mon, 20 Oct 2025 05:22:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:39.316039
Title: Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
Title（参考訳）: 2段階サンプリングによるマスク付き自己回帰モデルの生成と再構築
Authors: Feihong Yan, Peiru Wang, Yao Zhu, Kaiyu Pang, Qingyan Wei, Huiqi Li, Linfeng Zhang,
Abstract要約: Masked Autoregressive (MAR)モデルは、並列生成能力のためのオートレグレッシブ(AR)モデルよりも、視覚生成の効率を向上する。我々は、生成を2段階に分解するトレーニング不要な階層的サンプリング戦略であるGeneration then Reconstruction (GtR)を紹介した。 ImageNetのクラス条件とテキスト・ツー・イメージ生成の実験は、MAR-Hの3.72倍のスピードアップを示しながら、同等の品質を維持している。
参考スコア（独自算出の注目度）: 14.372824543814602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked Autoregressive (MAR) models promise better efficiency in visual generation than autoregressive (AR) models for the ability of parallel generation, yet their acceleration potential remains constrained by the modeling complexity of spatially correlated visual tokens in a single step. To address this limitation, we introduce Generation then Reconstruction (GtR), a training-free hierarchical sampling strategy that decomposes generation into two stages: structure generation establishing global semantic scaffolding, followed by detail reconstruction efficiently completing remaining tokens. Assuming that it is more difficult to create an image from scratch than to complement images based on a basic image framework, GtR is designed to achieve acceleration by computing the reconstruction stage quickly while maintaining the generation quality by computing the generation stage slowly. Moreover, observing that tokens on the details of an image often carry more semantic information than tokens in the salient regions, we further propose Frequency-Weighted Token Selection (FTS) to offer more computation budget to tokens on image details, which are localized based on the energy of high frequency information. Extensive experiments on ImageNet class-conditional and text-to-image generation demonstrate 3.72x speedup on MAR-H while maintaining comparable quality (e.g., FID: 1.59, IS: 304.4 vs. original 1.59, 299.1), substantially outperforming existing acceleration methods across various model scales and generation tasks. Our codes will be released in https://github.com/feihongyan1/GtR.
Abstract（参考訳）: Masked Autoregressive (MAR)モデルは、並列生成能力のためのオートレグレッシブ(AR)モデルよりも優れた視覚生成効率を約束するが、その加速度ポテンシャルは1ステップで空間的に相関した視覚トークンのモデリング複雑さによって制限される。この制限に対処するために、我々は、生成をグローバルなセマンティックスキャフォールディングを確立する構造生成と、残りのトークンを効率的に完了させる詳細再構築という、2つの段階に分解するトレーニング不要な階層的サンプリング戦略であるGeneration then Reconstruction (GtR)を導入する。 GtRは、基本的な画像フレームワークに基づいて画像を補完するよりも、スクラッチから画像を作成することが難しいと仮定して、生成段階をゆっくりと計算して生成品質を維持しつつ、再構築段階を高速に計算することで加速を実現するように設計されている。さらに、画像の細部におけるトークンが有意な領域のトークンよりも意味的な情報を多く持つことを確認するとともに、高頻度情報のエネルギーに基づいてローカライズされた画像の詳細部へのトークンに対するより計算予算を提供するために、FTS ( Frequency-Weighted Token Selection) を提案する。 ImageNetのクラス条件とテキスト・ツー・イメージ生成に関する大規模な実験は、MAR-Hの3.72倍のスピードアップを示しながら、同等の品質(例えば、FID: 1.59, IS: 304.4 vs. オリジナルの1.59, 299.1)を維持している。私たちのコードはhttps://github.com/feihongyan1/GtR.comでリリースされます。

論文の概要: Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling

関連論文リスト