DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
- URL: http://arxiv.org/abs/2304.06025v4
- Date: Mon, 30 Oct 2023 21:44:52 GMT
- Title: DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
- Authors: Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira
Kemelmacher-Shlizerman
- Abstract summary: We present DreamPose, a diffusion-based method for generating animated fashion videos from still images.
Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion.
- Score: 63.179505586264014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present DreamPose, a diffusion-based method for generating animated
fashion videos from still images. Given an image and a sequence of human body
poses, our method synthesizes a video containing both human and fabric motion.
To achieve this, we transform a pretrained text-to-image model (Stable
Diffusion) into a pose-and-image guided video synthesis model, using a novel
fine-tuning strategy, a set of architectural changes to support the added
conditioning signals, and techniques to encourage temporal consistency. We
fine-tune on a collection of fashion videos from the UBC Fashion dataset. We
evaluate our method on a variety of clothing styles and poses, and demonstrate
that our method produces state-of-the-art results on fashion video
animation.Video results are available on our project page.
Related papers
- Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos.
Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm.
We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z) - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - ViViD: Video Virtual Try-on using Diffusion Models [46.710863047471264]
Video virtual try-on aims to transfer a clothing item onto the video of a target person.
Previous video-based try-on solutions can only generate low visual quality and blurring results.
We present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on.
arXiv Detail & Related papers (2024-05-20T05:28:22Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z) - FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video
Synthesis from Static Imagery [3.3063015889158716]
This study introduces a new image-to-video generator called FashionFlow to generate fashion videos.
By utilising a diffusion model, we are able to create short videos from still fashion images.
arXiv Detail & Related papers (2023-09-29T19:34:32Z) - Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with
Image Diffusion Model [57.855362366674264]
We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues.
Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
arXiv Detail & Related papers (2023-08-15T13:00:42Z) - Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style
Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video.
Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization.
Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.