Related papers: Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

URL: http://arxiv.org/abs/2304.02051v2
Date: Wed, 23 Aug 2023 12:45:27 GMT
Title: Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
Authors: Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara
Abstract summary: We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images. We tackle this problem by proposing a new architecture based on latent diffusion models. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
Score: 40.70752781891058
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations are publicly available at: https://github.com/aimagelab/multimodal-garment-designer.

Related papers

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation [36.66066619847558]
Fashion industry has increasingly adopted AI technologies to enhance customer experience. Fashion-RAG is first work to introduce a retrieval-augmented generation approach specifically tailored for multimodal fashion image editing.
arXiv Detail & Related papers (2025-04-18T18:02:33Z)
AIpparel: A Multimodal Foundation Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments. We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z)
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation [29.489516715874306]
We present UniFashion, a unified framework that simultaneously tackles the challenges of multimodal generation and retrieval tasks within the fashion domain. Our model significantly outperforms previous single-task state-of-the-art models across diverse fashion tasks.
arXiv Detail & Related papers (2024-08-21T03:17:20Z)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z)
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
This paper tackles the task of multimodal-conditioned fashion image editing. Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures.
arXiv Detail & Related papers (2024-03-21T20:43:10Z)
HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models [17.74292177764933]
We propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff. Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages. Our model supports fashion design generation and fine-grained local editing in a single framework.
arXiv Detail & Related papers (2024-01-15T03:38:57Z)
Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design [14.588884182004277]
We present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. The dataset comprises over a million high-quality fashion images, paired with detailed text descriptions. To foster standardization in the T2I-based fashion design field, we propose a new benchmark for evaluating the performance of fashion design models.
arXiv Detail & Related papers (2023-11-19T06:43:11Z)
Fashion Matrix: Editing Photos by Just Talking [66.83502497764698]
We develop a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking. Fashion Matrix employs Large Language Models (LLMs) as its foundational support and engages in iterative interactions with users. Visual Foundation Models are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes.
arXiv Detail & Related papers (2023-07-25T04:06:25Z)
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion [63.179505586264014]
We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion.
arXiv Detail & Related papers (2023-04-12T17:59:17Z)
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce. We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs. We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z)
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing [51.033376763225675]
We adapt style prior knowledge and flexibility of multi-modal control into one unified two-stage framework, M6-Fashion, focusing on the practical AI-aided Fashion design. M6-Fashion utilizes self-correction for the non-autoregressive generation to improve inference speed, enhance holistic consistency, and support various signal controls.
arXiv Detail & Related papers (2022-05-24T01:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.