Fugu-MT 論文翻訳(概要): DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

論文の概要: DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

arxiv url: http://arxiv.org/abs/2604.25477v1
Date: Tue, 28 Apr 2026 10:30:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.820102
Title: DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing
Title（参考訳）: DDA-Thinker:Reasoning-Driven Image Editingのための分離されたデュアル原子強化学習
Authors: Hanqing Yang, Qiang Zhou, Yongchao Du, Sashuai Zhou, Zhibin Wang, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng,
Abstract要約: 我々は、固定生成モデル(編集者)よりも計画モジュール(Thinker)を独立に最適化するためのThinker中心のフレームワークを提案する。このフレームワークは、フィードバックを検証可能なチェックリストによって実装された2つの異なる原子報酬に分解する。 RISE-Bench や KRIS-Bench などの推論駆動画像編集ベンチマークの実験は,我々の手法が全体的な性能を大幅に向上することを示した。
参考スコア（独自算出の注目度）: 41.08605870394525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent image editing models have achieved strong visual fidelity but often struggle with tasks requiring complex reasoning. To investigate and enhance the reasoning-grounded planning for image editing, we propose DDA-Thinker, a Thinker-centric framework designed for the independent optimization of a planning module (Thinker) over a fixed generative model (Editor). This decoupled Thinker-centric paradigm facilitates a controlled analysis of the planning module and makes its contribution under a fixed Editor easier to assess. To effectively guide this Thinker, we introduce a dual-atomic reinforcement learning framework. This framework decomposes feedback into two distinct atomic rewards implemented through verifiable checklists: a cognitive-atomic reward to directly assess the quality of the Thinker's executable plan, which serves as the actionable outcome of the Thinker's reasoning, and a visual-atomic reward to assess the final image quality. To improve checklist quality, our checklist synthesis is grounded not only in the source image and user instruction but also in a rational reference description of the ideal post-edit scene. To support this training, we further develop a two-stage data curation pipeline that first synthesizes a diverse and reasoning-focused dataset, then applies difficulty-aware refinement to curate an effective training curriculum for reinforcement learning. Extensive experiments on reasoning-driven image editing benchmarks, including RISE-Bench and KRIS-Bench, demonstrate that our approach substantially improves overall performance. Our method enables a community model to achieve results competitive with strong proprietary models, highlighting the practical potential of Thinker-centric optimization under a fixed-editor setting.
Abstract（参考訳）: 最近の画像編集モデルは、強い視覚的忠実さを達成しているが、複雑な推論を必要とするタスクにしばしば苦労している。 DDA-Thinkerは,固定生成モデル(編集者)よりも計画モジュール(Thinker)を独立に最適化するために設計された思考中心のフレームワークである。この分離されたThinker中心のパラダイムは、計画モジュールの制御された分析を促進し、そのコントリビューションを固定エディターで評価しやすくする。この思考を効果的に導くために、我々は二重原子強化学習フレームワークを導入する。このフレームワークは、フィードバックを検証可能なチェックリストを通じて実行された2つの異なる原子報酬に分解する: 思考者の実行可能な計画の質を直接評価する認知的報酬、思考者の推論の実行可能な結果となる認知的報酬、最終的な画像品質を評価する視覚的報酬。チェックリストの品質向上のために,我々のチェックリスト合成は,ソースイメージやユーザインストラクションだけでなく,理想的な編集後シーンの合理的な参照記述にも基礎を置いている。このトレーニングを支援するために,まず多様で推論に焦点を絞ったデータセットを合成し,さらに難易度に改良を加えて強化学習のための効果的なトレーニングカリキュラムをキュレートする2段階データキュレーションパイプラインを開発した。 RISE-Bench や KRIS-Bench などの推論駆動画像編集ベンチマークの大規模な実験により,本手法が全体的な性能を大幅に向上することが示された。提案手法は,コミュニティモデルが強力なプロプライエタリモデルと競合する結果を達成し,固定編集環境下での思考中心最適化の実現可能性を明らかにする。

論文の概要: DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

関連論文リスト