Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing
- URL: http://arxiv.org/abs/2304.02051v2
- Date: Wed, 23 Aug 2023 12:45:27 GMT
- Title: Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing
- Authors: Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia,
Marco Bertini, Rita Cucchiara
- Abstract summary: We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
- Score: 40.70752781891058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Fashion illustration is used by designers to communicate their vision and to
bring the design idea from conceptualization to realization, showing how
clothes interact with the human body. In this context, computer vision can thus
be used to improve the fashion design process. Differently from previous works
that mainly focused on the virtual try-on of garments, we propose the task of
multimodal-conditioned fashion image editing, guiding the generation of
human-centric fashion images by following multimodal prompts, such as text,
human body poses, and garment sketches. We tackle this problem by proposing a
new architecture based on latent diffusion models, an approach that has not
been used before in the fashion domain. Given the lack of existing datasets
suitable for the task, we also extend two existing fashion datasets, namely
Dress Code and VITON-HD, with multimodal annotations collected in a
semi-automatic manner. Experimental results on these new datasets demonstrate
the effectiveness of our proposal, both in terms of realism and coherence with
the given multimodal inputs. Source code and collected multimodal annotations
are publicly available at:
https://github.com/aimagelab/multimodal-garment-designer.
Related papers
- AIpparel: A Large Multimodal Generative Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a large multimodal model for generating and editing sewing patterns.
Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments.
We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z) - UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation [29.489516715874306]
We present UniFashion, a unified framework that simultaneously tackles the challenges of multimodal generation and retrieval tasks within the fashion domain.
Our model significantly outperforms previous single-task state-of-the-art models across diverse fashion tasks.
arXiv Detail & Related papers (2024-08-21T03:17:20Z) - FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models.
We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z) - Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
This paper tackles the task of multimodal-conditioned fashion image editing.
Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures.
arXiv Detail & Related papers (2024-03-21T20:43:10Z) - HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models [17.74292177764933]
We propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff.
Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages.
Our model supports fashion design generation and fine-grained local editing in a single framework.
arXiv Detail & Related papers (2024-01-15T03:38:57Z) - Fashion Matrix: Editing Photos by Just Talking [66.83502497764698]
We develop a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking.
Fashion Matrix employs Large Language Models (LLMs) as its foundational support and engages in iterative interactions with users.
Visual Foundation Models are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes.
arXiv Detail & Related papers (2023-07-25T04:06:25Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z) - M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing [51.033376763225675]
We adapt style prior knowledge and flexibility of multi-modal control into one unified two-stage framework, M6-Fashion, focusing on the practical AI-aided Fashion design.
M6-Fashion utilizes self-correction for the non-autoregressive generation to improve inference speed, enhance holistic consistency, and support various signal controls.
arXiv Detail & Related papers (2022-05-24T01:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.