Hierarchical Fashion Design with Multi-stage Diffusion Models
- URL: http://arxiv.org/abs/2401.07450v3
- Date: Sat, 20 Jan 2024 05:21:13 GMT
- Title: Hierarchical Fashion Design with Multi-stage Diffusion Models
- Authors: Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Ying Cao
- Abstract summary: Cross-modal fashion synthesis and editing offer intelligent support to fashion designers.
Current diffusion models demonstrate commendable stability and controllability in image synthesis.
We propose HieraFashDiff,a novel fashion design method using the shared multi-stage diffusion model.
- Score: 17.848891542772446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal fashion synthesis and editing offer intelligent support to
fashion designers by enabling the automatic generation and local modification
of design drafts.While current diffusion models demonstrate commendable
stability and controllability in image synthesis,they still face significant
challenges in generating fashion design from abstract design elements and
fine-grained editing.Abstract sensory expressions, \eg office, business, and
party, form the high-level design concepts, while measurable aspects like
sleeve length, collar type, and pant length are considered the low-level
attributes of clothing.Controlling and editing fashion images using lengthy
text descriptions poses a difficulty.In this paper, we propose HieraFashDiff,a
novel fashion design method using the shared multi-stage diffusion model
encompassing high-level design concepts and low-level clothing attributes in a
hierarchical structure.Specifically, we categorized the input text into
different levels and fed them in different time step to the diffusion model
according to the criteria of professional clothing designers.HieraFashDiff
allows designers to add low-level attributes after high-level prompts for
interactive editing incrementally.In addition, we design a differentiable loss
function in the sampling process with a mask to keep non-edit
areas.Comprehensive experiments performed on our newly conducted Hierarchical
fashion dataset,demonstrate that our proposed method outperforms other
state-of-the-art competitors.
Related papers
- DiCTI: Diffusion-based Clothing Designer via Text-guided Input [5.275658744475251]
DiCTI (Diffusion-based Clothing Designer via Text-guided Input) allows designers to quickly visualize fashion-related ideas using text inputs only.
By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs.
arXiv Detail & Related papers (2024-07-04T12:48:36Z) - MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesigner revolutionizes artistic typography by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement.
A comprehensive feedback mechanism harnesses insights from multimodal models and user evaluations to refine and enhance the design process iteratively.
Empirical validations highlight MetaDesigner's capability to effectively serve diverse WordArt applications, consistently producing aesthetically appealing and context-sensitive results.
arXiv Detail & Related papers (2024-06-28T11:58:26Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models.
We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z) - Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources.
We introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts.
Our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello.
arXiv Detail & Related papers (2024-04-23T17:58:33Z) - Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
This paper tackles the task of multimodal-conditioned fashion image editing.
Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures.
arXiv Detail & Related papers (2024-03-21T20:43:10Z) - HAIFIT: Human-to-AI Fashion Image Translation [6.034505799418777]
We introduce HAIFIT, a novel approach that transforms sketches into high-fidelity, lifelike clothing images.
Our method excels in preserving the distinctive style and intricate details essential for fashion design applications.
arXiv Detail & Related papers (2024-03-13T16:06:07Z) - Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.