Dress&Dance: Dress up and Dance as You Like It - Technical Preview
- URL: http://arxiv.org/abs/2508.21070v1
- Date: Thu, 28 Aug 2025 17:59:55 GMT
- Title: Dress&Dance: Dress up and Dance as You Like It - Technical Preview
- Authors: Jun-Kun Chen, Aayush Bansal, Minh Phuoc Vo, Yu-Xiong Wang,
- Abstract summary: Dress&Dance is a video diffusion framework that generates high quality 5-second-long 24 FPS virtual try-on videos.<n>Our approach requires a single user image and supports a range of tops, bottoms, and one-piece garments, as well as simultaneous tops and bottoms try-on in a single pass.
- Score: 55.78895889755938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Dress&Dance, a video diffusion framework that generates high quality 5-second-long 24 FPS virtual try-on videos at 1152x720 resolution of a user wearing desired garments while moving in accordance with a given reference video. Our approach requires a single user image and supports a range of tops, bottoms, and one-piece garments, as well as simultaneous tops and bottoms try-on in a single pass. Key to our framework is CondNet, a novel conditioning network that leverages attention to unify multi-modal inputs (text, images, and videos), thereby enhancing garment registration and motion fidelity. CondNet is trained on heterogeneous training data, combining limited video data and a larger, more readily available image dataset, in a multistage progressive manner. Dress&Dance outperforms existing open source and commercial solutions and enables a high quality and flexible try-on experience.
Related papers
- Fashion-VDM: Video Diffusion Model for Virtual Try-On [17.284966713669927]
We present Fashion-VDM, a video diffusion model (VDM) for generating virtual try-on videos.
Given an input garment image and person video, our method aims to generate a high-quality try-on video of the person wearing the given garment.
arXiv Detail & Related papers (2024-10-31T21:52:33Z) - OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person [38.69239957207417]
OutfitAnyone generates high-fidelity and detail-consistent images for virtual clothing trials.
It distinguishes itself with scalability-ulating factors such as pose, body shape and broad applicability.
OutfitAnyone's performance in diverse scenarios underscores its utility and readiness for real-world deployment.
arXiv Detail & Related papers (2024-07-23T07:04:42Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - ViViD: Video Virtual Try-on using Diffusion Models [46.710863047471264]
Video virtual try-on aims to transfer a clothing item onto the video of a target person.
Previous video-based try-on solutions can only generate low visual quality and blurring results.
We present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on.
arXiv Detail & Related papers (2024-05-20T05:28:22Z) - High-Quality Animatable Dynamic Garment Reconstruction from Monocular
Videos [51.8323369577494]
We propose the first method to recover high-quality animatable dynamic garments from monocular videos without depending on scanned data.
To generate reasonable deformations for various unseen poses, we propose a learnable garment deformation network.
We show that our method can reconstruct high-quality dynamic garments with coherent surface details, which can be easily animated under unseen poses.
arXiv Detail & Related papers (2023-11-02T13:16:27Z) - Dressing in the Wild by Watching Dance Videos [69.7692630502019]
This paper attends to virtual try-on in real-world scenes and brings improvements in authenticity and naturalness.
We propose a novel generative network called wFlow that can effectively push up garment transfer to in-the-wild context.
arXiv Detail & Related papers (2022-03-29T08:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.