DiffMorph: Text-less Image Morphing with Diffusion Models
- URL: http://arxiv.org/abs/2401.00739v1
- Date: Mon, 1 Jan 2024 12:42:32 GMT
- Title: DiffMorph: Text-less Image Morphing with Diffusion Models
- Authors: Shounak Chatterjee
- Abstract summary: verb|DiffMorph| synthesizes images that mix concepts without the use of textual prompts.
verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image.
We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-conditioned image generation models are a prevalent use of AI image
synthesis, yet intuitively controlling output guided by an artist remains
challenging. Current methods require multiple images and textual prompts for
each object to specify them as concepts to generate a single customized image.
On the other hand, our work, \verb|DiffMorph|, introduces a novel approach
that synthesizes images that mix concepts without the use of textual prompts.
Our work integrates a sketch-to-image module to incorporate user sketches as
input. \verb|DiffMorph| takes an initial image with conditioning artist-drawn
sketches to generate a morphed image.
We employ a pre-trained text-to-image diffusion model and fine-tune it to
reconstruct each image faithfully. We seamlessly merge images and concepts from
sketches into a cohesive composition. The image generation capability of our
work is demonstrated through our results and a comparison of these with
prompt-based image generation.
Related papers
- Fast Personalized Text-to-Image Syntheses With Attention Injection [17.587109812987475]
We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image.
Our method can generate personalized images without any fine-tuning while maintaining the inherent text-to-image generation ability of diffusion models.
arXiv Detail & Related papers (2024-03-17T17:42:02Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing
with Pre-Trained Diffusion Model [22.975965453227477]
We introduce a new framework called textitPaste, Inpaint and Harmonize via Denoising (PhD)
In our experiments, we apply PhD to both subject-driven image editing tasks and explore text-driven scene generation given a reference subject.
arXiv Detail & Related papers (2023-06-13T07:43:10Z) - Unsupervised Compositional Concepts Discovery with Text-to-Image
Generative Models [80.75258849913574]
In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image?
We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images.
arXiv Detail & Related papers (2023-06-08T17:02:15Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - AI Illustrator: Translating Raw Descriptions into Images by Prompt-based
Cross-Modal Generation [61.77946020543875]
We propose a framework for translating raw descriptions with complex semantics into semantically corresponding images.
Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN.
Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training.
arXiv Detail & Related papers (2022-09-07T13:53:54Z) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models.
We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject.
The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.