Fugu-MT 論文翻訳(概要): DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

論文の概要: DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

arxiv url: http://arxiv.org/abs/2412.03255v1
Date: Wed, 04 Dec 2024 11:54:57 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-05 21:42:00.654726
Title: DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Title（参考訳）: DynamicControl: 改良されたテキスト・画像生成のための適応条件選択
Authors: Qingdong He, Jinlong Peng, Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Yong Liu, Yabiao Wang, Chengjie Wang, Xiangtai Li, Jiangning Zhang,
Abstract要約: 多様な制御信号の動的組み合わせをサポートするDynamicControlを提案する。様々な条件下での制御性,生成品質,構成性の観点から,DynamicControlは既存の手法よりも優れていることを示す。
参考スコア（独自算出の注目度）: 63.63429658282696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To enhance the controllability of text-to-image diffusion models, current ControlNet-like models have explored various control signals to dictate image attributes. However, existing methods either handle conditions inefficiently or use a fixed number of conditions, which does not fully address the complexity of multiple conditions and their potential conflicts. This underscores the need for innovative approaches to manage multiple conditions effectively for more reliable and detailed image synthesis. To address this issue, we propose a novel framework, DynamicControl, which supports dynamic combinations of diverse control signals, allowing adaptive selection of different numbers and types of conditions. Our approach begins with a double-cycle controller that generates an initial real score sorting for all input conditions by leveraging pre-trained conditional generation models and discriminative models. This controller evaluates the similarity between extracted conditions and input conditions, as well as the pixel-level similarity with the source image. Then, we integrate a Multimodal Large Language Model (MLLM) to build an efficient condition evaluator. This evaluator optimizes the ordering of conditions based on the double-cycle controller's score ranking. Our method jointly optimizes MLLMs and diffusion models, utilizing MLLMs' reasoning capabilities to facilitate multi-condition text-to-image (T2I) tasks. The final sorted conditions are fed into a parallel multi-control adapter, which learns feature maps from dynamic visual conditions and integrates them to modulate ControlNet, thereby enhancing control over generated images. Through both quantitative and qualitative comparisons, DynamicControl demonstrates its superiority over existing methods in terms of controllability, generation quality and composability under various conditional controls.
Abstract（参考訳）: テキストから画像への拡散モデルの制御性を高めるため、現在のControlNetのようなモデルでは、画像属性を決定するための様々な制御信号が探索されている。しかし、既存の手法は条件を非効率に扱うか、あるいは複数の条件の複雑さとその潜在的な矛盾を完全には解決しない一定の数の条件を使用するかのいずれかである。このことは、より信頼性が高く詳細な画像合成のために、複数の条件を効果的に管理するための革新的なアプローチの必要性を浮き彫りにする。この問題に対処するために,多様な制御信号の動的組み合わせをサポートする新しいフレームワークであるDynamicControlを提案する。提案手法は,事前学習された条件生成モデルと識別モデルを活用することにより,全ての入力条件に対する初期実点ソートを生成する2サイクル制御器から始める。本制御器は、抽出された条件と入力条件との類似性、およびソース画像との画素レベルの類似性を評価する。次に,マルチモーダル大規模言語モデル (MLLM) を統合し,効率的な条件評価器を構築する。この評価器は、二重サイクルコントローラのスコアランキングに基づいて条件の順序付けを最適化する。本手法は,MLLMの推論機能を利用して,MLLMと拡散モデルを協調的に最適化し,マルチ条件テキスト・トゥ・イメージ(T2I)タスクを容易にする。最終的なソート条件は並列マルチコントロールアダプタに入力され、動的視覚条件から特徴マップを学習し、それを統合して制御ネットを変調し、生成した画像の制御を強化する。定量的および定性的な比較を通じて、DynamicControlは、様々な条件下での制御性、生成品質、構成性の観点から、既存の方法よりも優れていることを示す。

関連論文リスト

LLMControl: Grounded Control of Text-to-Image Diffusion-based Synthesis with Multimodal LLMs [3.6016438645365834]
制御可能なT2I生成タスクの課題に対処するため, LLM_Control というフレームワークを提案する。 LLM_Controlは、接地性能を向上させることにより、事前学習した拡散モデルを正確に変調する。我々はマルチモーダル LLM をグローバルコントローラとして利用し,空間レイアウトの配置,意味記述の強化,オブジェクト属性のバインドを行う。
論文参考訳（メタデータ） (2025-07-26T12:57:02Z)
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models [55.42794740244581]
マルチ条件画像生成のためのフレームワークであるDC(Decouple)-ControlNetを紹介する。 DC-ControlNetの背景にある基本的な考え方は、制御条件を分離し、グローバルな制御を階層的なシステムに変換することである。要素間の相互作用について、多要素間相互作用を正確に処理するInter-Element Controllerを導入する。
論文参考訳（メタデータ） (2025-02-20T18:01:02Z)
OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
OminiControlは、イメージ条件をトレーニング済みのDiffusion Transformer(DiT)モデルに統合するフレームワークである。コアとなるOminiControlはパラメータ再利用機構を活用しており、強力なバックボーンとしてイメージ条件をエンコードすることができる。 OminiControlは、主観駆動生成や空間的に整合した条件を含む、幅広いイメージコンディショニングタスクを統一的に処理する。
論文参考訳（メタデータ） (2024-11-22T17:55:15Z)
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation [24.07613591217345]
言語制御は効果的なコンテンツ生成を可能にするが、画像生成のきめ細かい制御に苦慮する。 AnyControlは、生成プロセスのガイドとして、統一されたマルチモーダル埋め込みを抽出する、新しいマルチControlフレームワークを開発している。このアプローチは、ユーザ入力の全体的理解を可能にし、汎用的な制御信号の下で高品質で忠実な結果を生成する。
論文参考訳（メタデータ） (2024-06-27T07:40:59Z)
ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
拡散モデル(DM)の出現により、条件付き視覚発生は顕著に進展した。高価な計算コスト、高い推論遅延、大規模言語モデル(LLM)との統合の難しさといった課題は、DMに代わる方法を模索する必要がある。本稿では,フレキシブルかつ効率的な条件生成のための視覚自己回帰モデリングにおける画素レベル制御を探求する新しいフレームワークであるControlmoreを紹介する。
論文参考訳（メタデータ） (2024-06-14T06:35:33Z)
OmniControlNet: Dual-stage Integration for Conditional Image Generation [61.1432268643639]
我々は、外部条件生成アルゴリズムを1つの高密度予測法に統合することにより、広く採用されているコントロールネットの双方向統合を提供する。提案したOmniControlNetは,1)タスク埋め込み指導下での1つのマルチタスク高密度予測アルゴリズムによる条件生成と,2)テキスト埋め込み指導下での異なる条件付き画像生成プロセスを統合した。
論文参考訳（メタデータ） (2024-06-09T18:03:47Z)
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation [99.4649330193233]
制御可能なテキスト画像拡散モデル(T2I)は、テキストプロンプトとエッジマップのような他のモダリティのセマンティック入力の両方に条件付き画像を生成する。制御可能なT2I生成のためのフレキシブルで効率的なFlexEControlを提案する。
論文参考訳（メタデータ） (2024-05-08T06:09:11Z)
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [20.910939141948123]
ControlNet++は、生成した画像と条件付き制御の間のピクセルレベルのサイクル一貫性を明示的に最適化することで、制御可能な生成を改善する新しいアプローチである。 ControlNetの11.1%のmIoU、13.4%のSSIM、7.6%のRMSE、それぞれセグメンテーションマスク、ラインアートエッジ、深さ条件の改善を実現している。
論文参考訳（メタデータ） (2024-04-11T17:59:09Z)
CCM: Adding Conditional Controls to Text-to-Image Consistency Models [89.75377958996305]
本稿では,Consistency Models に ControlNet のような条件制御を追加するための代替戦略を検討する。軽量アダプタは、一貫性トレーニングを通じて、複数の条件下で共同で最適化することができる。これらの3つの解は, エッジ, 奥行き, 人間のポーズ, 低解像度画像, マスキング画像など, 様々な条件制御にまたがる。
論文参考訳（メタデータ） (2023-12-12T04:16:03Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。