Composer: Creative and Controllable Image Synthesis with Composable
Conditions
- URL: http://arxiv.org/abs/2302.09778v2
- Date: Wed, 22 Feb 2023 02:14:55 GMT
- Title: Composer: Creative and Controllable Image Synthesis with Composable
Conditions
- Authors: Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou
- Abstract summary: Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.
This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.
- Score: 57.78533372393828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent large-scale generative models learned on big data are capable of
synthesizing incredible images yet suffer from limited controllability. This
work offers a new generation paradigm that allows flexible control of the
output image, such as spatial layout and palette, while maintaining the
synthesis quality and model creativity. With compositionality as the core idea,
we first decompose an image into representative factors, and then train a
diffusion model with all these factors as the conditions to recompose the
input. At the inference stage, the rich intermediate representations work as
composable elements, leading to a huge design space (i.e., exponentially
proportional to the number of decomposed factors) for customizable content
creation. It is noteworthy that our approach, which we call Composer, supports
various levels of conditions, such as text description as the global
information, depth map and sketch as the local guidance, color histogram for
low-level details, etc. Besides improving controllability, we confirm that
Composer serves as a general framework and facilitates a wide range of
classical generative tasks without retraining. Code and models will be made
available.
Related papers
- AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks [23.041812897803034]
We propose Any Synth, a unified framework capable of generating arbitrary type of synthetic data.
We have validated our framework's performance across various tasks, including Few-shot Object Detection, Cross-domain Object Detection, Zero-shot Image Retrieval, and Multi-modal Image Perception and Grounding.
arXiv Detail & Related papers (2024-11-24T04:49:07Z) - Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis [43.481539150288434]
This work introduces a new family of.
factor graph Diffusion Models (FG-DMs)
FG-DMs models the joint distribution of.
images and conditioning variables, such as semantic, sketch,.
deep or normal maps via a factor graph decomposition.
arXiv Detail & Related papers (2024-10-29T00:54:00Z) - ControlCom: Controllable Image Composition using Diffusion Model [45.48263800282992]
We propose a controllable image composition method that unifies four tasks in one diffusion model.
We also propose a local enhancement module to enhance the foreground details in the diffusion model.
The proposed method is evaluated on both public benchmark and real-world data.
arXiv Detail & Related papers (2023-08-19T14:56:44Z) - Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects.
By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z) - Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis [77.23998762763078]
We present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis.
Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output.
We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image.
arXiv Detail & Related papers (2022-08-29T17:37:29Z) - Semantic Palette: Guiding Scene Generation with Class Proportions [34.746963256847145]
We introduce a conditional framework with novel architecture designs and learning objectives, which effectively accommodates class proportions to guide the scene generation process.
Thanks to the semantic control, we can produce layouts close to the real distribution, helping enhance the whole scene generation process.
We demonstrate the merit of our approach for data augmentation: semantic segmenters trained on real layout-image pairs outperform models only trained on real pairs.
arXiv Detail & Related papers (2021-06-03T07:04:00Z) - Person-in-Context Synthesiswith Compositional Structural Space [59.129960774988284]
We propose a new problem, textbfPersons in Context Synthesis, which aims to synthesize diverse person instance(s) in consistent contexts.
The context is specified by the bounding box object layout which lacks shape information, while pose of the person(s) by keypoints which are sparsely annotated.
To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared compositional structural space''
This structural space is then decoded to the image space using multi-level feature modulation strategy, and learned in a self
arXiv Detail & Related papers (2020-08-28T14:33:28Z) - Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications.
The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z) - Example-Guided Image Synthesis across Arbitrary Scenes using Masked
Spatial-Channel Attention and Self-Supervision [83.33283892171562]
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map.
We propose an end-to-end network for joint global and local feature alignment and synthesis.
arXiv Detail & Related papers (2020-04-18T18:17:40Z) - Synthesizing human-like sketches from natural images using a conditional
convolutional decoder [3.3504365823045035]
We propose a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images.
We train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs.
The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study.
arXiv Detail & Related papers (2020-03-16T10:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.