State of the Art on Diffusion Models for Visual Computing
- URL: http://arxiv.org/abs/2310.07204v1
- Date: Wed, 11 Oct 2023 05:32:29 GMT
- Title: State of the Art on Diffusion Models for Visual Computing
- Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T.
Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski,
Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias
Nie{\ss}ner, Bj\"orn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein
- Abstract summary: This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model.
We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing.
We discuss available datasets, metrics, open challenges, and social implications.
- Score: 191.6168813012954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of visual computing is rapidly advancing due to the emergence of
generative artificial intelligence (AI), which unlocks unprecedented
capabilities for the generation, editing, and reconstruction of images, videos,
and 3D scenes. In these domains, diffusion models are the generative AI
architecture of choice. Within the last year alone, the literature on
diffusion-based tools and applications has seen exponential growth and relevant
papers are published across the computer graphics, computer vision, and AI
communities with new works appearing daily on arXiv. This rapid growth of the
field makes it difficult to keep up with all recent developments. The goal of
this state-of-the-art report (STAR) is to introduce the basic mathematical
concepts of diffusion models, implementation details and design choices of the
popular Stable Diffusion model, as well as overview important aspects of these
generative AI tools, including personalization, conditioning, inversion, among
others. Moreover, we give a comprehensive overview of the rapidly growing
literature on diffusion-based generation and editing, categorized by the type
of generated medium, including 2D images, videos, 3D objects, locomotion, and
4D scenes. Finally, we discuss available datasets, metrics, open challenges,
and social implications. This STAR provides an intuitive starting point to
explore this exciting topic for researchers, artists, and practitioners alike.
Related papers
- Diffusion Models in 3D Vision: A Survey [11.116658321394755]
We review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks.
These approaches include 3D object generation, shape completion, point cloud reconstruction, and scene understanding.
We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining.
arXiv Detail & Related papers (2024-10-07T04:12:23Z) - Generative AI in Vision: A Survey on Models, Metrics and Applications [0.0]
Generative AI models have revolutionized various fields by enabling the creation of realistic and diverse data samples.
Among these models, diffusion models have emerged as a powerful approach for generating high-quality images, text, and audio.
This survey paper provides a comprehensive overview of generative AI diffusion and legacy models, focusing on their underlying techniques, applications across different domains, and their challenges.
arXiv Detail & Related papers (2024-02-26T07:47:12Z) - Advances in 3D Generation: A Survey [54.95024616672868]
The field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models.
Specifically, we introduce the 3D representations that serve as the backbone for 3D generation.
We provide a comprehensive overview of the rapidly growing literature on generation methods, categorized by the type of algorithmic paradigms.
arXiv Detail & Related papers (2024-01-31T13:06:48Z) - Towards Graph Foundation Models: A Survey and Beyond [66.37994863159861]
Foundation models have emerged as critical components in a variety of artificial intelligence applications.
The capabilities of foundation models to generalize and adapt motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm.
This article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies.
arXiv Detail & Related papers (2023-10-18T09:31:21Z) - A Survey on Video Diffusion Models [103.03565844371711]
The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision.
Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers.
This paper presents a comprehensive review of video diffusion models in the AIGC era.
arXiv Detail & Related papers (2023-10-16T17:59:28Z) - Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.
Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields.
We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools.
Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - From paintbrush to pixel: A review of deep neural networks in AI-generated art [0.0]
This paper explores the various deep neural network architectures and models that have been utilized to create AI-generated art.
From the classic convolutional networks to the cutting-edge diffusion models, we examine the key players in the field.
With a unique blend of technical explanations and insights into the current state of AI-generated art, this paper exemplifies how art and computer science interact.
arXiv Detail & Related papers (2023-02-14T16:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.