SGDiff: A Style Guided Diffusion Model for Fashion Synthesis
- URL: http://arxiv.org/abs/2308.07605v1
- Date: Tue, 15 Aug 2023 07:20:22 GMT
- Title: SGDiff: A Style Guided Diffusion Model for Fashion Synthesis
- Authors: Zhengwentai Sun, Yanghong Zhou, Honghong He, P. Y. Mok
- Abstract summary: The proposed SGDiff combines image modality with a pretrained text-to-image diffusion model to facilitate creative fashion image synthesis.
It addresses the limitations of text-to-image diffusion models by incorporating supplementary style guidance.
This paper also introduces a new dataset -- SG-Fashion, specifically designed for fashion image synthesis applications.
- Score: 2.4578723416255754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper reports on the development of \textbf{a novel style guided
diffusion model (SGDiff)} which overcomes certain weaknesses inherent in
existing models for image synthesis. The proposed SGDiff combines image
modality with a pretrained text-to-image diffusion model to facilitate creative
fashion image synthesis. It addresses the limitations of text-to-image
diffusion models by incorporating supplementary style guidance, substantially
reducing training costs, and overcoming the difficulties of controlling
synthesized styles with text-only inputs. This paper also introduces a new
dataset -- SG-Fashion, specifically designed for fashion image synthesis
applications, offering high-resolution images and an extensive range of garment
categories. By means of comprehensive ablation study, we examine the
application of classifier-free guidance to a variety of conditions and validate
the effectiveness of the proposed model for generating fashion images of the
desired categories, product attributes, and styles. The contributions of this
paper include a novel classifier-free guidance method for multi-modal feature
fusion, a comprehensive dataset for fashion image synthesis application, a
thorough investigation on conditioned text-to-image synthesis, and valuable
insights for future research in the text-to-image synthesis domain. The code
and dataset are available at: \url{https://github.com/taited/SGDiff}.
Related papers
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z) - Text-to-image Diffusion Models in Generative AI: A Survey [75.32882187215394]
We present a review of state-of-the-art methods on text-conditioned image synthesis, i.e., text-to-image.
We discuss applications beyond text-to-image generation: text-guided creative generation and text-guided image editing.
arXiv Detail & Related papers (2023-03-14T13:49:54Z) - StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
Text-to-Image Synthesis [54.39789900854696]
StyleGAN-T addresses the specific requirements of large-scale text-to-image synthesis.
It significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed.
arXiv Detail & Related papers (2023-01-23T16:05:45Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Text-Guided Synthesis of Artistic Images with Retrieval-Augmented
Diffusion Models [12.676356746752894]
We present an alternative approach based on retrieval-augmented diffusion models (RDMs)
We replace the retrieval database with a more specialized database that contains only images of a particular visual style.
This provides a novel way to prompt a general trained model after training and thereby specify a particular visual style.
arXiv Detail & Related papers (2022-07-26T16:56:51Z) - GLIDE: Towards Photorealistic Image Generation and Editing with
Text-Guided Diffusion Models [16.786221846896108]
We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies.
We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples.
Our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing.
arXiv Detail & Related papers (2021-12-20T18:42:55Z) - More Control for Free! Image Synthesis with Semantic Diffusion Guidance [79.88929906247695]
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from an example image.
We introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.
We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis.
arXiv Detail & Related papers (2021-12-10T18:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.