ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
- URL: http://arxiv.org/abs/2412.17811v3
- Date: Thu, 03 Apr 2025 09:47:55 GMT
- Title: ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
- Authors: Siyuan Bian, Chenghao Xu, Yuliang Xiu, Artur Grigorev, Zhen Liu, Cewu Lu, Michael J. Black, Yao Feng,
- Abstract summary: ChatGarment is a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments.<n>It can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions.
- Score: 79.46056192947924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments from images or text descriptions. Unlike previous methods that struggle in real-world scenarios or lack interactive editing capabilities, ChatGarment can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions, all within an interactive dialogue. These sewing patterns can then be draped on a 3D body and animated. This is achieved by finetuning a VLM to directly generate a JSON file that includes both textual descriptions of garment types and styles, as well as continuous numerical attributes. This JSON file is then used to create sewing patterns through a programming parametric model. To support this, we refine the existing programming model, GarmentCode, by expanding its garment type coverage and simplifying its structure for efficient VLM fine-tuning. Additionally, we construct a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs through an automated data pipeline. Extensive evaluations demonstrate ChatGarment's ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to simplify workflows in fashion and gaming applications. Code and data are available at https://chatgarment.github.io/ .
Related papers
- GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers [9.228577662928673]
generative modeling of sewing patterns is crucial for creating diversified garments.
We present textbftextitGarmentDiffusion, a new generative model capable of producing centimeter-precise, vectorized 3D sewing patterns.
arXiv Detail & Related papers (2025-04-30T09:56:59Z) - DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation [10.9550231281676]
We present a data-driven method for learning to generate animations of 3D garments using a 2D image diffusion model.
Our approach is able to synthesize high-quality 3D animations for a wide variety of garments and body shapes.
arXiv Detail & Related papers (2025-03-24T06:08:26Z) - AIpparel: A Large Multimodal Generative Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a large multimodal model for generating and editing sewing patterns.<n>Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments.<n>We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z) - GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details [31.92583566128599]
Traditional 3D garment creation is labor-intensive, involving sketching, modeling, UV mapping, and time-consuming processes.
We propose GarmentDreamer, a novel method that leverages 3D Gaussian Splatting (GS) as guidance to generate 3D garment from text prompts.
arXiv Detail & Related papers (2024-05-20T23:54:28Z) - DressCode: Autoregressively Sewing and Generating Garments from Text Guidance [61.48120090970027]
DressCode aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.
We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns.
We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments.
arXiv Detail & Related papers (2024-01-29T16:24:21Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - LayoutGPT: Compositional Visual Planning and Generation with Large
Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions.
We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.