Fugu-MT 論文翻訳(概要): Unified Thinker: A General Reasoning Modular Core for Image Generation

論文の概要: Unified Thinker: A General Reasoning Modular Core for Image Generation

arxiv url: http://arxiv.org/abs/2601.03127v1
Date: Tue, 06 Jan 2026 15:59:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-07 17:02:13.005067
Title: Unified Thinker: A General Reasoning Modular Core for Image Generation
Title（参考訳）: Unified Thinker:画像生成のためのモジュラーコア
Authors: Sashuai Zhou, Qiang Zhou, Jijin Hu, Hanqing Yang, Yue Cao, Junpeng Ma, Yinchao Ma, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng, Zhou Zhao,
Abstract要約: 汎用画像生成のためのタスクに依存しない推論アーキテクチャであるUnified Thinkerを提案する。 Unified Thinkerはイメージジェネレータから専用のThinkerを分離し、生成モデル全体をトレーニングすることなく、推論のモジュラーアップグレードを可能にする。テキスト・画像生成と画像編集の実験により、Unified Thinkerは画像の推論と生成品質を大幅に改善することが示された。
参考スコア（独自算出の注目度）: 57.665309753609144
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning--execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.
Abstract（参考訳）: 高忠実度画像合成の顕著な進歩にもかかわらず、生成モデルは論理集約的な指示に苦戦し、永続的な推論-実行のギャップを露呈する。一方、クローズドソースシステム(例:Nano Banana)は、強力な推論駆動の画像生成を示し、現在のオープンソースモデルとの大きなギャップを浮き彫りにした。このギャップを埋めるには、単により良いビジュアルジェネレータを必要とするだけでなく、実行可能な推論が必要である、と我々は主張する。この目的のために,汎用画像生成のためのタスク依存推論アーキテクチャであるUnified Thinkerを提案し,多様なジェネレータやワークフローにプラグイン可能な統一計画コアとして設計した。 Unified Thinkerはイメージジェネレータから専用のThinkerを分離し、生成モデル全体をトレーニングすることなく、推論のモジュラーアップグレードを可能にする。我々はまず、Thinkerのための構造化された計画インターフェースを構築し、次に強化学習を適用して、そのポリシーをピクセルレベルのフィードバックで基礎づけ、テキストの可視性よりも視覚的正しさを最適化する計画を奨励する。テキスト・画像生成と画像編集に関する大規模な実験により、Unified Thinkerは画像の推論と生成品質を大幅に改善することが示された。

論文の概要: Unified Thinker: A General Reasoning Modular Core for Image Generation

関連論文リスト