Video Diffusion Models: A Survey
- URL: http://arxiv.org/abs/2405.03150v2
- Date: Sun, 17 Nov 2024 00:40:09 GMT
- Title: Video Diffusion Models: A Survey
- Authors: Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter,
- Abstract summary: Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content.
This survey provides an overview of the critical components of diffusion models for video generation, including their applications, architectural design, and temporal dynamics modeling.
- Score: 3.7985353171858045
- License:
- Abstract: Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content. This survey provides a comprehensive overview of the critical components of diffusion models for video generation, including their applications, architectural design, and temporal dynamics modeling. The paper begins by discussing the core principles and mathematical formulations, then explores various architectural choices and methods for maintaining temporal consistency. A taxonomy of applications is presented, categorizing models based on input modalities such as text prompts, images, videos, and audio signals. Advancements in text-to-video generation are discussed to illustrate the state-of-the-art capabilities and limitations of current approaches. Additionally, the survey summarizes recent developments in training and evaluation practices, including the use of diverse video and image datasets and the adoption of various evaluation metrics to assess model performance. The survey concludes with an examination of ongoing challenges, such as generating longer videos and managing computational costs, and offers insights into potential future directions for the field. By consolidating the latest research and developments, this survey aims to serve as a valuable resource for researchers and practitioners working with video diffusion models. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models
Related papers
- Diffusion Model-Based Video Editing: A Survey [47.45047496559506]
This paper reviews diffusion model-based video editing techniques, including theoretical foundations and practical applications.
We categorize video editing approaches by the inherent connections of their core technologies, depicting evolutionary trajectory.
This paper also dives into novel applications, including point-based editing and pose-guided human video editing.
arXiv Detail & Related papers (2024-06-26T04:58:39Z) - Diffusion Models in Low-Level Vision: A Survey [82.77962165415153]
diffusion model-based solutions have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity.
We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models.
We summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios.
arXiv Detail & Related papers (2024-06-17T01:49:27Z) - Generative AI in Vision: A Survey on Models, Metrics and Applications [0.0]
Generative AI models have revolutionized various fields by enabling the creation of realistic and diverse data samples.
Among these models, diffusion models have emerged as a powerful approach for generating high-quality images, text, and audio.
This survey paper provides a comprehensive overview of generative AI diffusion and legacy models, focusing on their underlying techniques, applications across different domains, and their challenges.
arXiv Detail & Related papers (2024-02-26T07:47:12Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - A Survey on Video Diffusion Models [103.03565844371711]
The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision.
Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers.
This paper presents a comprehensive review of video diffusion models in the AIGC era.
arXiv Detail & Related papers (2023-10-16T17:59:28Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.