TCIG: Two-Stage Controlled Image Generation with Quality Enhancement
through Diffusion
- URL: http://arxiv.org/abs/2403.01212v1
- Date: Sat, 2 Mar 2024 13:59:02 GMT
- Title: TCIG: Two-Stage Controlled Image Generation with Quality Enhancement
through Diffusion
- Authors: Salaheldin Mohamed
- Abstract summary: A two-stage method that combines controllability and high quality in the generation of images is proposed.
By separating controllability from high quality, This method achieves outstanding results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, significant progress has been made in the development of
text- to-image generation models. However, these models still face limitations
when it comes to achieving full controllability during the generation process.
Often, spe- cific training or the use of limited models is required, and even
then, they have certain restrictions. To address these challenges, A two-stage
method that effec- tively combines controllability and high quality in the
generation of images is proposed. This approach leverages the expertise of
pre-trained models to achieve precise control over the generated images, while
also harnessing the power of diffusion models to achieve state-of-the-art
quality. By separating controllability from high quality, This method achieves
outstanding results. It is compatible with both latent and image space
diffusion models, ensuring versatility and flexibil- ity. Moreover, This
approach consistently produces comparable outcomes to the current
state-of-the-art methods in the field. Overall, This proposed method rep-
resents a significant advancement in text-to-image generation, enabling
improved controllability without compromising on the quality of the generated
images.
Related papers
- A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling.
CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process.
Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z) - ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs)
Challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs.
This paper introduces Controlmore, a novel framework that explores pixel-level controls in visual autoregressive modeling for flexible and efficient conditional generation.
arXiv Detail & Related papers (2024-06-14T06:35:33Z) - TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation [5.195293792493412]
We propose an innovative method that integrates Singular Value Decomposition into the Low-Rank Adaptation (LoRA) parameter update strategy.
By incorporating SVD within the LoRA framework, our method not only effectively reduces the risk of overfitting but also enhances the stability of model outputs.
arXiv Detail & Related papers (2024-05-18T09:29:00Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.