Interactive Fashion Content Generation Using LLMs and Latent Diffusion
Models
- URL: http://arxiv.org/abs/2306.05182v1
- Date: Mon, 15 May 2023 18:38:25 GMT
- Title: Interactive Fashion Content Generation Using LLMs and Latent Diffusion
Models
- Authors: Krishna Sri Ipsit Mantri and Nevasini Sasikumar
- Abstract summary: Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe.
We propose a method exploiting the equivalence between diffusion models and energy-based models (EBMs)
Our results indicate that using an LLM to refine the prompts to the latent diffusion model assists in generating globally creative and culturally diversified fashion styles.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fashionable image generation aims to synthesize images of diverse fashion
prevalent around the globe, helping fashion designers in real-time
visualization by giving them a basic customized structure of how a specific
design preference would look in real life and what further improvements can be
made for enhanced customer satisfaction. Moreover, users can alone interact and
generate fashionable images by just giving a few simple prompts. Recently,
diffusion models have gained popularity as generative models owing to their
flexibility and generation of realistic images from Gaussian noise. Latent
diffusion models are a type of generative model that use diffusion processes to
model the generation of complex data, such as images, audio, or text. They are
called "latent" because they learn a hidden representation, or latent variable,
of the data that captures its underlying structure. We propose a method
exploiting the equivalence between diffusion models and energy-based models
(EBMs) and suggesting ways to compose multiple probability distributions. We
describe a pipeline on how our method can be used specifically for new
fashionable outfit generation and virtual try-on using LLM-guided text-to-image
generation. Our results indicate that using an LLM to refine the prompts to the
latent diffusion model assists in generating globally creative and culturally
diversified fashion styles and reducing bias.
Related papers
- Diffusion Models For Multi-Modal Generative Modeling [32.61765315067488]
We propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space.
We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling.
arXiv Detail & Related papers (2024-07-24T18:04:17Z) - FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models.
We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z) - Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations [7.604214200457584]
Diffusion Cocktail (Ditail) is a training-free method that transfers style and content information between multiple diffusion models.
Ditail offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents.
arXiv Detail & Related papers (2023-12-12T00:53:56Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models.
Our method leverages a pretrained large language model for grounded generation in a novel two-stage process.
Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC [102.64648158034568]
diffusion models have quickly become the prevailing approach to generative modeling in many domains.
We propose an energy-based parameterization of diffusion models which enables the use of new compositional operators.
We find these samplers lead to notable improvements in compositional generation across a wide set of problems.
arXiv Detail & Related papers (2023-02-22T18:48:46Z) - Extracting Training Data from Diffusion Models [77.11719063152027]
We show that diffusion models memorize individual images from their training data and emit them at generation time.
With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models.
We train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy.
arXiv Detail & Related papers (2023-01-30T18:53:09Z) - Implementing and Experimenting with Diffusion Models for Text-to-Image
Generation [0.0]
Two models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image.
Text-to-image models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet.
This thesis contributes by reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model.
arXiv Detail & Related papers (2022-09-22T12:03:33Z) - Compositional Visual Generation with Composable Diffusion Models [80.75258849913574]
We propose an alternative structured approach for compositional generation using diffusion models.
An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image.
The proposed method can generate scenes at test time that are substantially more complex than those seen in training.
arXiv Detail & Related papers (2022-06-03T17:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.