A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images
- URL: http://arxiv.org/abs/2502.21151v2
- Date: Mon, 10 Mar 2025 22:22:19 GMT
- Title: A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images
- Authors: Zineb Sordo, Eric Chagnon, Daniela Ushizima,
- Abstract summary: This review surveys the state-of-the-art in text-to-image and image-to-image generation within the scope of generative AI.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This review surveys the state-of-the-art in text-to-image and image-to-image generation within the scope of generative AI. We provide a comparative analysis of three prominent architectures: Variational Autoencoders, Generative Adversarial Networks and Diffusion Models. For each, we elucidate core concepts, architectural innovations, and practical strengths and limitations, particularly for scientific image understanding. Finally, we discuss critical open challenges and potential future research directions in this rapidly evolving field.
Related papers
- Personalized Image Generation with Deep Generative Models: A Decade Survey [51.26287478042516]
We present a review of generalized personalized image generation across various generative models.<n>We first define a unified framework that standardizes the personalization process across different generative models.<n>We then provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations.
arXiv Detail & Related papers (2025-02-18T17:34:04Z) - Generative AI for Vision: A Comprehensive Study of Frameworks and Applications [0.0]
Generative AI is transforming image synthesis, enabling the creation of high-quality, diverse, and photorealistic visuals.<n>This work presents a structured classification of image generation techniques based on the nature of the input.<n>We highlight key frameworks including DALL-E, ControlNet, and DeepSeek Janus-Pro, and address challenges such as computational costs, data biases, and output alignment with user intent.
arXiv Detail & Related papers (2025-01-29T22:42:05Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Diffusion-Based Visual Art Creation: A Survey and New Perspectives [51.522935314070416]
This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives.
Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation.
We aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.
arXiv Detail & Related papers (2024-08-22T04:49:50Z) - Generative Artificial Intelligence: A Systematic Review and Applications [7.729155237285151]
This paper documents the systematic review and analysis of recent advancements and techniques in Generative AI.
The major impact that generative AI has made to date, has been in language generation with the development of large language models.
The paper ends with a discussion of Responsible AI principles, and the necessary ethical considerations for the sustainability and growth of these generative models.
arXiv Detail & Related papers (2024-05-17T18:03:59Z) - Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation [30.245348014602577]
We discuss the evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora.
Our review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation.
We conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community.
arXiv Detail & Related papers (2024-03-08T07:58:13Z) - State of the Art on Diffusion Models for Visual Computing [191.6168813012954]
This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model.
We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing.
We discuss available datasets, metrics, open challenges, and social implications.
arXiv Detail & Related papers (2023-10-11T05:32:29Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Text-to-image Diffusion Models in Generative AI: A Survey [86.11421833017693]
This survey reviews the progress of diffusion models in generating images from text.
We discuss applications beyond image generation, such as text-guided generation for various modalities like videos, and text-guided image editing.
arXiv Detail & Related papers (2023-03-14T13:49:54Z) - From paintbrush to pixel: A review of deep neural networks in AI-generated art [0.0]
This paper explores the various deep neural network architectures and models that have been utilized to create AI-generated art.
From the classic convolutional networks to the cutting-edge diffusion models, we examine the key players in the field.
With a unique blend of technical explanations and insights into the current state of AI-generated art, this paper exemplifies how art and computer science interact.
arXiv Detail & Related papers (2023-02-14T16:58:32Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - Adversarial Text-to-Image Synthesis: A Review [7.593633267653624]
We contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision.
We critically examine current strategies to evaluate text-to-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training.
This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.
arXiv Detail & Related papers (2021-01-25T09:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.