ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
- URL: http://arxiv.org/abs/2412.17811v2
- Date: Sat, 28 Dec 2024 02:24:34 GMT
- Title: ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
- Authors: Siyuan Bian, Chenghao Xu, Yuliang Xiu, Artur Grigorev, Zhen Liu, Cewu Lu, Michael J. Black, Yao Feng,
- Abstract summary: ChatGarment is a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments.
It can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions.
- Score: 79.46056192947924
- License:
- Abstract: We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments from images or text descriptions. Unlike previous methods that struggle in real-world scenarios or lack interactive editing capabilities, ChatGarment can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions, all within an interactive dialogue. These sewing patterns can then be draped into 3D garments, which are easily animatable and simulatable. This is achieved by finetuning a VLM to directly generate a JSON file that includes both textual descriptions of garment types and styles, as well as continuous numerical attributes. This JSON file is then used to create sewing patterns through a programming parametric model. To support this, we refine the existing programming model, GarmentCode, by expanding its garment type coverage and simplifying its structure for efficient VLM fine-tuning. Additionally, we construct a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs through an automated data pipeline. Extensive evaluations demonstrate ChatGarment's ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to revolutionize workflows in fashion and gaming applications. Code and data will be available at https://chatgarment.github.io/.
Related papers
- Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics [27.697150953628572]
This paper focuses on 3D garment generation, a key area for applications like virtual try-on with dynamic garment animations.
We introduce Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.
arXiv Detail & Related papers (2025-02-05T18:49:03Z) - AIpparel: A Large Multimodal Generative Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a large multimodal model for generating and editing sewing patterns.
Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments.
We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z) - GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns [18.513707884523072]
We present the first large-scale synthetic dataset of 3D made-to-measure garments with sewing patterns.
GarmentCodeData contains 115,000 data points that cover a variety of designs in many common garment categories.
We propose an automatic, open-source 3D garment draping pipeline based on a fast XPBD simulator.
arXiv Detail & Related papers (2024-05-27T19:14:46Z) - GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details [31.92583566128599]
Traditional 3D garment creation is labor-intensive, involving sketching, modeling, UV mapping, and time-consuming processes.
We propose GarmentDreamer, a novel method that leverages 3D Gaussian Splatting (GS) as guidance to generate 3D garment from text prompts.
arXiv Detail & Related papers (2024-05-20T23:54:28Z) - DressCode: Autoregressively Sewing and Generating Garments from Text Guidance [61.48120090970027]
DressCode aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.
We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns.
We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments.
arXiv Detail & Related papers (2024-01-29T16:24:21Z) - LayoutGPT: Compositional Visual Planning and Generation with Large
Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions.
We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z) - DrapeNet: Garment Generation and Self-Supervised Draping [95.0315186890655]
We rely on self-supervision to train a single network to drape multiple garments.
This is achieved by predicting a 3D deformation field conditioned on the latent codes of a generative network.
Our pipeline can generate and drape previously unseen garments of any topology.
arXiv Detail & Related papers (2022-11-21T09:13:53Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.