DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
- URL: http://arxiv.org/abs/2412.17644v3
- Date: Sat, 18 Jan 2025 11:08:25 GMT
- Title: DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
- Authors: Ente Lin, Xujie Zhang, Fuwei Zhao, Yuxuan Luo, Xin Dong, Long Zeng, Xiaodan Liang,
- Abstract summary: Diffusion models for garment-centric human generation from text or image prompts have garnered emerging attention.
We propose DreamFit, which incorporates a lightweight Anything-Dressing specifically tailored for the garment-centric human generation.
Our model generalizes surprisingly well to a wide range of (non-)garments, creative styles, and prompt instructions, consistently delivering high-quality results.
- Score: 51.09561183696647
- License:
- Abstract: Diffusion models for garment-centric human generation from text or image prompts have garnered emerging attention for their great application potential. However, existing methods often face a dilemma: lightweight approaches, such as adapters, are prone to generate inconsistent textures; while finetune-based methods involve high training costs and struggle to maintain the generalization capabilities of pretrained diffusion models, limiting their performance across diverse scenarios. To address these challenges, we propose DreamFit, which incorporates a lightweight Anything-Dressing Encoder specifically tailored for the garment-centric human generation. DreamFit has three key advantages: (1) \textbf{Lightweight training}: with the proposed adaptive attention and LoRA modules, DreamFit significantly minimizes the model complexity to 83.4M trainable parameters. (2)\textbf{Anything-Dressing}: Our model generalizes surprisingly well to a wide range of (non-)garments, creative styles, and prompt instructions, consistently delivering high-quality results across diverse scenarios. (3) \textbf{Plug-and-play}: DreamFit is engineered for smooth integration with any community control plugins for diffusion models, ensuring easy compatibility and minimizing adoption barriers. To further enhance generation quality, DreamFit leverages pretrained large multi-modal models (LMMs) to enrich the prompt with fine-grained garment descriptions, thereby reducing the prompt gap between training and inference. We conduct comprehensive experiments on both $768 \times 512$ high-resolution benchmarks and in-the-wild images. DreamFit surpasses all existing methods, highlighting its state-of-the-art capabilities of garment-centric human generation.
Related papers
- DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On [103.89972383310715]
DiffusionTrend harnesses latent information rich in prior information to capture the nuances of garment details.
It delivers a visually compelling try-on experience, underscoring the potential of training-free diffusion model.
arXiv Detail & Related papers (2024-12-19T02:24:35Z) - Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior [8.314155285516073]
MOPED is the first method to leverage a novel multi-modal conditional diffusion model as a prior for SMPL pose parameters.
Our method offers powerful unconditional pose generation with the ability to condition on multi-modal inputs such as images and text.
arXiv Detail & Related papers (2024-10-18T15:29:19Z) - PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - DREAM: Diffusion Rectification and Estimation-Adaptive Models [50.66535824749801]
We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models.
DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which balances perception against distortion.
arXiv Detail & Related papers (2023-11-30T21:44:39Z) - HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models [58.39439948383928]
HyperDreamBooth is a hypernetwork capable of efficiently generating a small set of personalized weights from a single image.
Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion.
arXiv Detail & Related papers (2023-07-13T17:59:47Z) - SMPLicit: Topology-aware Generative Model for Clothed People [65.84665248796615]
We introduce SMPLicit, a novel generative model to jointly represent body pose, shape and clothing geometry.
In the experimental section, we demonstrate SMPLicit can be readily used for fitting 3D scans and for 3D reconstruction in images of dressed people.
arXiv Detail & Related papers (2021-03-11T18:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.