Fugu-MT 論文翻訳(概要): Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

論文の概要: Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

arxiv url: http://arxiv.org/abs/2508.17364v1
Date: Sun, 24 Aug 2025 13:47:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.488848
Title: Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Title（参考訳）: コンディションウィービングとエキスパート変調:ユニバーサルで制御可能な画像生成を目指して
Authors: Guoqing Zhang, Xingtong Ge, Lu Shi, Xin Zhang, Muqing Xue, Wanru Xu, Yigang Cen,
Abstract要約: 多様な条件入力をサポートするUnified Image-to-image Generation (UniGen) フレームワークを提案する。 Condition Modulated Expert (CoMoE)モジュールは、視覚表現と条件モデリングのために意味的に類似したパッチ機能を集約する。また,背骨からのグローバルテキストレベル制御と条件分岐からのきめ細かい制御を効果的に相互作用できる動的ヘビ様接続機構WeaveNetを提案する。
参考スコア（独自算出の注目度）: 15.746410052754749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The image-to-image generation task aims to produce controllable images by leveraging conditional inputs and prompt instructions. However, existing methods often train separate control branches for each type of condition, leading to redundant model structures and inefficient use of computational resources. To address this, we propose a Unified image-to-image Generation (UniGen) framework that supports diverse conditional inputs while enhancing generation efficiency and expressiveness. Specifically, to tackle the widely existing parameter redundancy and computational inefficiency in controllable conditional generation architectures, we propose the Condition Modulated Expert (CoMoE) module. This module aggregates semantically similar patch features and assigns them to dedicated expert modules for visual representation and conditional modeling. By enabling independent modeling of foreground features under different conditions, CoMoE effectively mitigates feature entanglement and redundant computation in multi-condition scenarios. Furthermore, to bridge the information gap between the backbone and control branches, we propose WeaveNet, a dynamic, snake-like connection mechanism that enables effective interaction between global text-level control from the backbone and fine-grained control from conditional branches. Extensive experiments on the Subjects-200K and MultiGen-20M datasets across various conditional image generation tasks demonstrate that our method consistently achieves state-of-the-art performance, validating its advantages in both versatility and effectiveness. The code has been uploaded to https://github.com/gavin-gqzhang/UniGen.
Abstract（参考訳）: 画像から画像へ生成するタスクは、条件付き入力を活用して制御可能な画像を生成し、指示を促すことを目的としている。しかし、既存の手法では、各種類の条件に対して別々の制御分岐を訓練することが多く、冗長なモデル構造と計算資源の非効率な利用につながる。そこで本稿では,生成効率と表現性を高めつつ,多様な条件入力をサポートするUnified Image-to-image Generation(UniGen)フレームワークを提案する。具体的には、制御可能な条件生成アーキテクチャにおけるパラメータ冗長性と計算不効率性に対処するため、条件変調エキスパート(CoMoE)モジュールを提案する。このモジュールは意味的に類似したパッチ機能を集約し、視覚表現と条件モデリングのために専門的なモジュールに割り当てる。異なる条件下でのフォアグラウンド機能の独立したモデリングを可能にすることで、CoMoEはマルチ条件シナリオにおける機能の絡み合いと冗長な計算を効果的に軽減する。さらに、バックボーンとコントロールブランチ間の情報ギャップを埋めるため、バックボーンからのグローバルテキストレベル制御と条件分岐からのきめ細かい制御との効果的な相互作用を可能にする、ダイナミックなヘビのような接続機構であるWeaveNetを提案する。各種条件付き画像生成タスクにおけるSubjects-200KとMultiGen-20Mデータセットの大規模な実験により,本手法が常に最先端の性能を達成し,汎用性と有効性の両方においてその利点を実証した。コードはhttps://github.com/gavin-gqzhang/UniGen.comにアップロードされた。

論文の概要: Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

関連論文リスト