Related papers: Survey of Video Diffusion Models: Foundations, Implementations, and Applications

Survey of Video Diffusion Models: Foundations, Implementations, and Applications

URL: http://arxiv.org/abs/2504.16081v1
Date: Tue, 22 Apr 2025 17:59:17 GMT
Title: Survey of Video Diffusion Models: Foundations, Implementations, and Applications
Authors: Yimu Wang, Xuye Liu, Wei Pang, Li Ma, Shuai Yuan, Paul Debevec, Ning Yu,
Abstract summary: Recent advances in diffusion models have revolutionized video generation, offering superior temporal consistency and visual quality compared to traditional generative adversarial networks-based approaches.<n>This survey provides a comprehensive review of diffusion-based video generation, examining its evolution, technical foundations, and practical applications.<n>We present a systematic taxonomy of current methodologies, analyze architectural innovations and optimization strategies, and investigate applications across low-level vision tasks such as denoising and super-resolution.
Score: 15.060158551865099
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in diffusion models have revolutionized video generation, offering superior temporal consistency and visual quality compared to traditional generative adversarial networks-based approaches. While this emerging field shows tremendous promise in applications, it faces significant challenges in motion consistency, computational efficiency, and ethical considerations. This survey provides a comprehensive review of diffusion-based video generation, examining its evolution, technical foundations, and practical applications. We present a systematic taxonomy of current methodologies, analyze architectural innovations and optimization strategies, and investigate applications across low-level vision tasks such as denoising and super-resolution. Additionally, we explore the synergies between diffusionbased video generation and related domains, including video representation learning, question answering, and retrieval. Compared to the existing surveys (Lei et al., 2024a;b; Melnik et al., 2024; Cao et al., 2023; Xing et al., 2024c) which focus on specific aspects of video generation, such as human video synthesis (Lei et al., 2024a) or long-form content generation (Lei et al., 2024b), our work provides a broader, more updated, and more fine-grained perspective on diffusion-based approaches with a special section for evaluation metrics, industry solutions, and training engineering techniques in video generation. This survey serves as a foundational resource for researchers and practitioners working at the intersection of diffusion models and video generation, providing insights into both the theoretical frameworks and practical implementations that drive this rapidly evolving field. A structured list of related works involved in this survey is also available on https://github.com/Eyeline-Research/Survey-Video-Diffusion.

Related papers

Motion Generation: A Survey of Generative Approaches and Benchmarks [1.4254358932994455]
We provide an in-depth categorization of motion generation methods based on their underlying generative strategies.<n>Our main focus is on papers published in top-tier venues since 2023, reflecting the most recent advancements in the field.<n>We analyze architectural principles, conditioning mechanisms, and generation settings, and compile a detailed overview of the evaluation metrics and datasets used across the literature.
arXiv Detail & Related papers (2025-07-07T19:04:56Z)
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism [73.38167494118746]
We propose a framework to improve the realism of motion in generated videos.<n>We advocate for the incorporation of a retrieval mechanism during the generation phase.<n>Our pipeline is designed to apply to any text-to-video diffusion model.
arXiv Detail & Related papers (2025-04-09T08:14:05Z)
Video Summarization Techniques: A Comprehensive Review [1.6381055567716192]
The paper explores the various approaches and methods created for video summarizing, emphasizing both abstractive and extractive strategies. The process of extractive summarization involves the identification of key frames or segments from the source video, utilizing methods such as shot boundary recognition, and clustering. On the other hand, abstractive summarization creates new content by getting the essential content from the video, using machine learning models like deep neural networks and natural language processing, reinforcement learning, attention mechanisms, generative adversarial networks, and multi-modal learning.
arXiv Detail & Related papers (2024-10-06T11:17:54Z)
Diffusion Models in Low-Level Vision: A Survey [82.77962165415153]
diffusion model-based solutions have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models. We summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios.
arXiv Detail & Related papers (2024-06-17T01:49:27Z)
Video Diffusion Models: A Survey [3.7985353171858045]
Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content. This survey provides an overview of the critical components of diffusion models for video generation, including their applications, architectural design, and temporal dynamics modeling.
arXiv Detail & Related papers (2024-05-06T04:01:42Z)
A Survey on Long Video Generation: Challenges, Methods, and Prospects [36.58662591921549]
This paper presents the first survey of recent advancements in long video generation. We summarise them into two key paradigms: divide and conquer temporal autoregressive. We offer a comprehensive overview and classification of the datasets and evaluation metrics which are crucial for advancing long video generation research.
arXiv Detail & Related papers (2024-03-25T03:47:53Z)
A Survey on Super Resolution for video Enhancement Using GAN [0.0]
Recent developments in super-resolution image and video using deep learning algorithms such as Generative Adversarial Networks are covered. Advancements aim to increase the visual clarity and quality of low-resolution video, have tremendous potential in a variety of sectors ranging from surveillance technology to medical imaging. This collection delves into the wider field of Generative Adversarial Networks, exploring their principles, training approaches, and applications across a broad range of domains.
arXiv Detail & Related papers (2023-12-27T08:41:38Z)
A Survey on Video Diffusion Models [103.03565844371711]
The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers. This paper presents a comprehensive review of video diffusion models in the AIGC era.
arXiv Detail & Related papers (2023-10-16T17:59:28Z)
A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications. Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)
Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z)
An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples. In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information. The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.