Related papers: HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

URL: http://arxiv.org/abs/2401.07450v4
Date: Thu, 12 Dec 2024 10:36:14 GMT
Title: HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models
Authors: Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Xinhan Di, Ying Cao,
Abstract summary: We propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff.<n>Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages.<n>Our model supports fashion design generation and fine-grained local editing in a single framework.
Score: 17.74292177764933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fashion design is a challenging and complex process.Recent works on fashion generation and editing are all agnostic of the actual fashion design process, which limits their usage in practice.In this paper, we propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff. Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages: 1) an ideation stage that generates design proposals given high-level concepts and 2) an iteration stage that continuously refines the proposals using low-level attributes. Our model supports fashion design generation and fine-grained local editing in a single framework. To train our model, we contribute a new dataset of full-body fashion images annotated with hierarchical text descriptions. Extensive evaluations show that, as compared to prior approaches, our method can generate fashion designs and edited results with higher fidelity and better prompt adherence, showing its promising potential to augment the practical fashion design workflow. Code and Dataset are available at https://github.com/haoli-zbdbc/hierafashdiff.

Related papers

Rethinking Layered Graphic Design Generation with a Top-Down Approach [76.33538798060326]
Graphic design is crucial for conveying ideas and messages. Designers usually organize their work into objects, backgrounds, and vectorized text layers to simplify editing.<n>With the rise of GenAI methods, an endless supply of high-quality graphic designs in pixel format has become more accessible.<n>Despite this, non-layered designs still inspire human designers, influencing their choices in layouts and text styles, ultimately guiding the creation of layered designs.<n>Motivated by this observation, we propose Accordion, a graphic design generation framework taking the first attempt to convert AI-generated designs into editable layered designs.
arXiv Detail & Related papers (2025-07-08T02:26:08Z)
Learning to Synthesize Compatible Fashion Items Using Semantic Alignment and Collocation Classification: An Outfit Generation Framework [59.09707044733695]
We propose a novel outfit generation framework, i.e., OutfitGAN, with the aim of synthesizing an entire outfit. OutfitGAN includes a semantic alignment module, which is responsible for characterizing the mapping correspondence between the existing fashion items and the synthesized ones. In order to evaluate the performance of our proposed models, we built a large-scale dataset consisting of 20,000 fashion outfits.
arXiv Detail & Related papers (2025-02-05T12:13:53Z)
EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks. The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm. We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z)
AIpparel: A Multimodal Foundation Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments. We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z)
DiCTI: Diffusion-based Clothing Designer via Text-guided Input [5.275658744475251]
DiCTI (Diffusion-based Clothing Designer via Text-guided Input) allows designers to quickly visualize fashion-related ideas using text inputs only. By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs.
arXiv Detail & Related papers (2024-07-04T12:48:36Z)
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesigner revolutionizes artistic typography by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. A comprehensive feedback mechanism harnesses insights from multimodal models and user evaluations to refine and enhance the design process iteratively. Empirical validations highlight MetaDesigner's capability to effectively serve diverse WordArt applications, consistently producing aesthetically appealing and context-sensitive results.
arXiv Detail & Related papers (2024-06-28T11:58:26Z)
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation. Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts. We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. We introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts. Our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello.
arXiv Detail & Related papers (2024-04-23T17:58:33Z)
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
This paper tackles the task of multimodal-conditioned fashion image editing. Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures.
arXiv Detail & Related papers (2024-03-21T20:43:10Z)
HAIFIT: Human-to-AI Fashion Image Translation [6.034505799418777]
We introduce HAIFIT, a novel approach that transforms sketches into high-fidelity, lifelike clothing images. Our method excels in preserving the distinctive style and intricate details essential for fashion design applications.
arXiv Detail & Related papers (2024-03-13T16:06:07Z)
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks. The model is based on continuous diffusion models. Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z)
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z)
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training [12.652002299515864]
We propose a method for fine-grained fashion vision-language pre-training based on fashion Symbols and Attributes Prompt (FashionSAP) Firstly, we propose the fashion symbols, a novel abstract fashion concept layer, to represent different fashion items. Secondly, the attributes prompt method is proposed to make the model learn specific attributes of fashion items explicitly.
arXiv Detail & Related papers (2023-04-11T08:20:17Z)
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images. We tackle this problem by proposing a new architecture based on latent diffusion models. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z)
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce. We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs. We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z)
Modeling Artistic Workflows for Image Generation and Editing [83.43047077223947]
We propose a generative model that follows a given artistic workflow. It enables both multi-stage image generation as well as multi-stage image editing of an existing piece of art.
arXiv Detail & Related papers (2020-07-14T17:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.