FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video
Synthesis from Static Imagery
- URL: http://arxiv.org/abs/2310.00106v2
- Date: Sat, 20 Jan 2024 09:57:47 GMT
- Title: FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video
Synthesis from Static Imagery
- Authors: Tasin Islam, Alina Miron, XiaoHui Liu, Yongmin Li
- Abstract summary: This study introduces a new image-to-video generator called FashionFlow to generate fashion videos.
By utilising a diffusion model, we are able to create short videos from still fashion images.
- Score: 3.3063015889158716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our study introduces a new image-to-video generator called FashionFlow to
generate fashion videos. By utilising a diffusion model, we are able to create
short videos from still fashion images. Our approach involves developing and
connecting relevant components with the diffusion model, which results in the
creation of high-fidelity videos that are aligned with the conditional image.
The components include the use of pseudo-3D convolutional layers to generate
videos efficiently. VAE and CLIP encoders capture vital characteristics from
still images to condition the diffusion model at a global level. Our research
demonstrates a successful synthesis of fashion videos featuring models posing
from various angles, showcasing the fit and appearance of the garment. Our
findings hold great promise for improving and enhancing the shopping experience
for the online fashion industry.
Related papers
- ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning [36.378348127629195]
We propose a novel post-tuning methodology for video synthesis models, called ExVideo.
This approach is designed to enhance the capability of current video synthesis models, allowing them to produce content over extended temporal durations.
Our approach augments the model's capacity to generate up to $5times$ its original number of frames, requiring only 1.5k GPU hours of training on a dataset comprising 40k videos.
arXiv Detail & Related papers (2024-06-20T09:18:54Z) - Video Diffusion Models: A Survey [3.7985353171858045]
Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video.
This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics.
arXiv Detail & Related papers (2024-05-06T04:01:42Z) - FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models.
We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z) - VideoCrafter2: Overcoming Data Limitations for High-Quality Video
Diffusion Models [76.85329896854189]
We investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model.
We shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.
arXiv Detail & Related papers (2024-01-17T08:30:32Z) - DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - LLM-grounded Video Diffusion Models [57.23066793349706]
Video diffusion models have emerged as a promising tool for neuraltemporal generation.
Current models struggle with prompts and often restricted or incorrect motion.
We introduce LLM-grounded Video Diffusion (LVD)
Our results demonstrate that LVD significantly outperforms its base video diffusion model.
arXiv Detail & Related papers (2023-09-29T17:54:46Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion [63.179505586264014]
We present DreamPose, a diffusion-based method for generating animated fashion videos from still images.
Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion.
arXiv Detail & Related papers (2023-04-12T17:59:17Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.