FineControlNet: Fine-level Text Control for Image Generation with
Spatially Aligned Text Control Injection
- URL: http://arxiv.org/abs/2312.09252v1
- Date: Thu, 14 Dec 2023 18:59:43 GMT
- Title: FineControlNet: Fine-level Text Control for Image Generation with
Spatially Aligned Text Control Injection
- Authors: Hongsuk Choi, Isaac Kasahara, Selim Engin, Moritz Graule, Nikhil
Chavan-Dafle, and Volkan Isler
- Abstract summary: FineControlNet provides fine control over each instance's appearance while maintaining the precise pose control capability.
FineControlNet achieves superior performance in generating images that follow the user-provided instance-specific text prompts and poses.
- Score: 28.65209293141492
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently introduced ControlNet has the ability to steer the text-driven image
generation process with geometric input such as human 2D pose, or edge
features. While ControlNet provides control over the geometric form of the
instances in the generated image, it lacks the capability to dictate the visual
appearance of each instance. We present FineControlNet to provide fine control
over each instance's appearance while maintaining the precise pose control
capability. Specifically, we develop and demonstrate FineControlNet with
geometric control via human pose images and appearance control via
instance-level text prompts. The spatial alignment of instance-specific text
prompts and 2D poses in latent space enables the fine control capabilities of
FineControlNet. We evaluate the performance of FineControlNet with rigorous
comparison against state-of-the-art pose-conditioned text-to-image diffusion
models. FineControlNet achieves superior performance in generating images that
follow the user-provided instance-specific text prompts and poses compared with
existing methods. Project webpage:
https://samsunglabs.github.io/FineControlNet-project-page
Related papers
- DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models [55.42794740244581]
We introduce DC (Decouple)-ControlNet, a framework for multi-condition image generation.
The core idea behind DC-ControlNet is to decouple control conditions, transforming global control into a hierarchical system.
For interactions between elements, we introduce the Inter-Element Controller, which accurately handles multi-element interactions.
arXiv Detail & Related papers (2025-02-20T18:01:02Z) - ControlFace: Harnessing Facial Parametric Control for Face Rigging [31.765503860508378]
We introduce ControlFace, a novel face rigging method conditioned on 3DMM renderings that enables flexible, high-fidelity control.
We employ a dual-branch U-Nets: one, referred to as FaceNet, captures identity and fine details, while the other focuses on generation.
By training on a facial video dataset, we fully utilize FaceNet's rich representations while ensuring control adherence.
arXiv Detail & Related papers (2024-12-02T06:00:27Z) - AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation [24.07613591217345]
Linguistic control enables effective content creation, but struggles with fine-grained control over image generation.
AnyControl develops a novel Multi-Control framework that extracts a unified multi-modal embedding to guide the generation process.
This approach enables a holistic understanding of user inputs, and produces high-quality, faithful results under versatile control signals.
arXiv Detail & Related papers (2024-06-27T07:40:59Z) - Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [62.51232333352754]
Ctrl-Adapter adds diverse controls to any image/video diffusion model through the adaptation of pretrained ControlNets.
With six diverse U-Net/DiT-based image/video diffusion models, Ctrl-Adapter matches the performance of pretrained ControlNets on COCO.
arXiv Detail & Related papers (2024-04-15T17:45:36Z) - SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions [59.53867290769282]
We present a novel T2I generation method dubbed SmartControl to modify the rough visual conditions for adapting to text prompt.
The key idea of our SmartControl is to relax the visual condition on the areas that are conflicted with text prompts.
Experiments on four typical visual condition types clearly show the efficacy of our SmartControl against state-of-the-arts.
arXiv Detail & Related papers (2024-04-09T16:53:43Z) - Layout-to-Image Generation with Localized Descriptions using ControlNet
with Cross-Attention Control [20.533597112330018]
We show the limitations of ControlNet for the layout-to-image task and enable it to use localized descriptions.
We develop a novel cross-attention manipulation method in order to maintain image quality while improving control.
arXiv Detail & Related papers (2024-02-20T22:15:13Z) - ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems [19.02295657801464]
In this work, we take an existing controlling network (ControlNet) and change the communication between the controlling network and the generation process to be of high-frequency and with large-bandwidth.
We outperform state-of-the-art approaches for pixel-level guidance, such as depth, canny-edges, and semantic segmentation, and are on a par for loose keypoint-guidance of human poses.
All code and pre-trained models will be made publicly available.
arXiv Detail & Related papers (2023-12-11T17:58:06Z) - Fine-grained Controllable Video Generation via Object Appearance and
Context [74.23066823064575]
We propose fine-grained controllable video generation (FACTOR) to achieve detailed control.
FACTOR aims to control objects' appearances and context, including their location and category.
Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users.
arXiv Detail & Related papers (2023-12-05T17:47:33Z) - Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [82.19740045010435]
We introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls and global controls.
Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models.
Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability.
arXiv Detail & Related papers (2023-05-25T17:59:58Z) - UniControl: A Unified Diffusion Model for Controllable Visual Generation
In the Wild [166.25327094261038]
We introduce UniControl, a new generative foundation model for controllable condition-to-image (C2I) tasks.
UniControl consolidates a wide array of C2I tasks within a singular framework, while still allowing for arbitrary language prompts.
trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities.
arXiv Detail & Related papers (2023-05-18T17:41:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.