Fugu-MT 論文翻訳(概要): From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

論文の概要: From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

arxiv url: http://arxiv.org/abs/2604.02355v1
Date: Thu, 12 Mar 2026 12:49:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-12 18:41:08.516228
Title: From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
Title（参考訳）: 広帯域探索から安定合成へ:自己回帰画像生成のためのエントロピー誘導最適化
Authors: Han Song, Yucheng Zhou, Jianbing Shen, Yu Cheng,
Abstract要約: Reinforcement Learning (RL) によるChain-of-Thought (CoT) は、テキスト・ツー・イメージ(T2I) の生成を改善する。本稿では,3つの重要な洞察をもたらすエントロピーに基づく系統的分析について述べる。本稿では,不確実性により最適化予算を再配置する微調整戦略であるエントロピー誘導グループ相対政策最適化(EG-GRPO)を提案する。
参考スコア（独自算出の注目度）: 53.759125791348396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1) CoT expands the generative exploration space, while RL contracts it toward high-reward regions; (2) final reward is strongly negatively correlated with both the mean and variance of image-token entropy, highlighting the need to reduce uncertainty and instability; and (3) the entropy of the textual CoT directly governs downstream image quality, with lower-entropy CoTs leading to better generations. Motivated by these findings, we propose Entropy-Guided Group Relative Policy Optimization (EG-GRPO), a fine-tuning strategy that reallocates optimization budget by uncertainty: low-entropy tokens are excluded from reward-driven updates to preserve stability, while high-entropy tokens receive an entropy bonus that encourages structured exploration without collapse. Experiments on standard T2I benchmarks demonstrate that EG-GRPO achieves state-of-the-art performance.
Abstract（参考訳）: CoT(Chain-of-Thought)と強化学習(Reinforcement Learning, RL)を組み合わせることで、テキスト・ツー・イメージ(T2I)生成が改善されるが、CoTの探索とRLの最適化の基本的な相互作用は明確ではない。我々は,(1) CoTは生成的探索空間を拡大し,(2) RLは高逆領域に縮小する,(2) 最終報酬は画像のエントロピーの平均と分散の両方に負の相関を保ち,不確実性と不安定性を低減させる,(3) テキストCoTのエントロピーは下流画像の画質を直接支配し,低エントロピーCoTはより良い世代に繋がる,という3つの重要な洞察を与える。これらの結果から,低エントロピートークンは報酬駆動型更新から除外され,高エントロピートークンはエントロピーボーナスを受け,崩壊することなく構造的探索を促進する。標準的なT2Iベンチマークの実験は、EG-GRPOが最先端のパフォーマンスを達成することを示した。

論文の概要: From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

関連論文リスト