AIpparel: A Large Multimodal Generative Model for Digital Garments
- URL: http://arxiv.org/abs/2412.03937v3
- Date: Mon, 16 Dec 2024 02:39:18 GMT
- Title: AIpparel: A Large Multimodal Generative Model for Digital Garments
- Authors: Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan, Yang Zheng, Maria Korosteleva, Olga Sorkine-Hornung, Leonidas J. Guibas, Guandao Yang, Gordon Wetzstein,
- Abstract summary: We introduce AIpparel, a large multimodal model for generating and editing sewing patterns.
Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments.
We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
- Score: 71.12933771326279
- License:
- Abstract: Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a large multimodal model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and enables novel multimodal garment generation applications such as interactive garment editing. The project website is at georgenakayama.github.io/AIpparel/.
Related papers
- ChatGarment: Garment Estimation, Generation and Editing via Large Language Models [79.46056192947924]
ChatGarment is a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments.
It can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions.
arXiv Detail & Related papers (2024-12-23T18:59:28Z) - Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation [52.13927859375693]
We propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches.
To learn the sewing pattern distribution in the latent space, we design a two-step training strategy.
Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-12-19T02:05:28Z) - Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis [27.1965932507935]
We propose a novel sewing pattern generation approach based on Large Multimodal Models (LMMs)
LMM offers an intuitive interface for interpreting diverse design inputs.
pattern-making programs could serve as well-structured and semantically meaningful representations of sewing patterns.
arXiv Detail & Related papers (2024-12-11T18:26:45Z) - Multi-Garment Customized Model Generation [3.1679243514285194]
Multi-Garment Customized Model Generation is a unified framework based on Latent Diffusion Models (LDMs)
Our framework supports the conditional generation of multiple garments through decoupled multi-garment feature fusion.
The proposed garment encoder is a plug-and-play module that can be combined with other extension modules.
arXiv Detail & Related papers (2024-08-09T17:57:33Z) - SEED-Story: Multimodal Long Story Generation with Large Language Model [66.37077224696242]
SEED-Story is a novel method that leverages a Multimodal Large Language Model (MLLM) to generate extended multimodal stories.
We propose multimodal attention sink mechanism to enable the generation of stories with up to 25 sequences (only 10 for training) in a highly efficient autoregressive manner.
We present a large-scale and high-resolution dataset named StoryStream for training our model and quantitatively evaluating the task of multimodal story generation in various aspects.
arXiv Detail & Related papers (2024-07-11T17:21:03Z) - Matryoshka Multimodal Models [92.41824727506751]
We propose M3: Matryoshka Multimodal Models, which learns to represent visual content as nested sets of visual tokens.
We find that COCO-style benchmarks only need around 9 visual tokens to obtain accuracy similar to that of using all 576 tokens.
arXiv Detail & Related papers (2024-05-27T17:59:56Z) - Towards Garment Sewing Pattern Reconstruction from a Single Image [76.97825595711444]
Garment sewing pattern represents the intrinsic rest shape of a garment, and is the core for many applications like fashion design, virtual try-on, and digital avatars.
We first synthesize a versatile dataset, named SewFactory, which consists of around 1M images and ground-truth sewing patterns.
We then propose a two-level Transformer network called Sewformer, which significantly improves the sewing pattern prediction performance.
arXiv Detail & Related papers (2023-11-07T18:59:51Z) - Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.