Related papers: Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

URL: http://arxiv.org/abs/2508.17364v2
Date: Thu, 09 Oct 2025 08:50:35 GMT
Title: Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Authors: Guoqing Zhang, Xingtong Ge, Lu Shi, Xin Zhang, Muqing Xue, Wanru Xu, Yigang Cen, Jian Zhang,
Abstract summary: We propose a Unified image-to-image Generation (UniGen) framework that supports diverse conditional inputs.<n>Condition Modulated Expert (CoMoE) module aggregates semantically similar patch features for visual representation and conditional modeling.<n>We also propose WeaveNet, a dynamic, snake-like connection mechanism that enables effective interaction between global text-level control from the backbone and fine-grained control from conditional branches.
Score: 17.898556887669997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The image-to-image generation task aims to produce controllable images by leveraging conditional inputs and prompt instructions. However, existing methods often train separate control branches for each type of condition, leading to redundant model structures and inefficient use of computational resources. To address this, we propose a Unified image-to-image Generation (UniGen) framework that supports diverse conditional inputs while enhancing generation efficiency and expressiveness. Specifically, to tackle the widely existing parameter redundancy and computational inefficiency in controllable conditional generation architectures, we propose the Condition Modulated Expert (CoMoE) module. This module aggregates semantically similar patch features and assigns them to dedicated expert modules for visual representation and conditional modeling. By enabling independent modeling of foreground features under different conditions, CoMoE effectively mitigates feature entanglement and redundant computation in multi-condition scenarios. Furthermore, to bridge the information gap between the backbone and control branches, we propose WeaveNet, a dynamic, snake-like connection mechanism that enables effective interaction between global text-level control from the backbone and fine-grained control from conditional branches. Extensive experiments on the Subjects-200K and MultiGen-20M datasets across various conditional image generation tasks demonstrate that our method consistently achieves state-of-the-art performance, validating its advantages in both versatility and effectiveness. The code has been uploaded to https://github.com/gavin-gqzhang/UniGen.

Related papers

MoGen: A Unified Collaborative Framework for Controllable Multi-Object Image Generation [76.94658056824422]
MoGen is a user-friendly multi-object image generation method.<n>First, we design a Regional Semantic Anchor (RSA) module that precisely anchors phrase units in language descriptions to their corresponding image regions.<n>We introduce an Adaptive Multi-modal Guidance (AMG) module, which adaptively parses and integrates various combinations of multi-source control signals.
arXiv Detail & Related papers (2026-01-09T05:57:48Z)
Context-Aware Autoregressive Models for Multi-Conditional Image Generation [24.967166342680112]
ContextAR is a flexible and effective framework for multi-conditional image generation.<n>It embeds diverse conditions directly into the token sequence, preserving modality-specific semantics.<n>We show that the competitive perpormance than diffusion-based multi-conditional control approaches the existing autoregressive baseline.
arXiv Detail & Related papers (2025-05-18T07:27:02Z)
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation [64.8341372591993]
We propose a new approach to unify controllable generation within a single framework.<n>Specifically, we propose the unified image-instruction adapter (UNIC-Adapter) built on the Multi-Modal-Diffusion Transformer architecture.<n>Our UNIC-Adapter effectively extracts multi-modal instruction information by incorporating both conditional images and task instructions.
arXiv Detail & Related papers (2024-12-25T15:19:02Z)
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation [63.63429658282696]
We propose DynamicControl, which supports dynamic combinations of diverse control signals.<n>We show that DynamicControl is superior to existing methods in terms of controllability, generation quality and composability under various conditional controls.
arXiv Detail & Related papers (2024-12-04T11:54:57Z)
OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures.<n>OminiControl addresses these limitations through three key innovations.
arXiv Detail & Related papers (2024-11-22T17:55:15Z)
ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs) Challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces Controlmore, a novel framework that explores pixel-level controls in visual autoregressive modeling for flexible and efficient conditional generation.
arXiv Detail & Related papers (2024-06-14T06:35:33Z)
OmniControlNet: Dual-stage Integration for Conditional Image Generation [61.1432268643639]
We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method. Our proposed OmniControlNet consolidates 1) the condition generation by a single multi-tasking dense prediction algorithm under the task embedding guidance and 2) the image generation process for different conditioning types under the textual embedding guidance.
arXiv Detail & Related papers (2024-06-09T18:03:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.