A Survey of AI Text-to-Image and AI Text-to-Video Generators
- URL: http://arxiv.org/abs/2311.06329v1
- Date: Fri, 10 Nov 2023 17:33:58 GMT
- Title: A Survey of AI Text-to-Image and AI Text-to-Video Generators
- Authors: Aditi Singh
- Abstract summary: Text-to-Image and Text-to-Video AI generation models are revolutionary technologies that use deep learning and natural language processing (NLP) techniques to create images and videos from textual descriptions.
This paper investigates cutting-edge approaches in the discipline of Text-to-Image and Text-to-Video AI generations.
- Score: 0.4662017507844857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-Image and Text-to-Video AI generation models are revolutionary
technologies that use deep learning and natural language processing (NLP)
techniques to create images and videos from textual descriptions. This paper
investigates cutting-edge approaches in the discipline of Text-to-Image and
Text-to-Video AI generations. The survey provides an overview of the existing
literature as well as an analysis of the approaches used in various studies. It
covers data preprocessing techniques, neural network types, and evaluation
metrics used in the field. In addition, the paper discusses the challenges and
limitations of Text-to-Image and Text-to-Video AI generations, as well as
future research directions. Overall, these models have promising potential for
a wide range of applications such as video production, content creation, and
digital marketing.
Related papers
- LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media
Generators [33.7054351451505]
We describe the techniques that can be used to make Large Language Models (LLMs) act as Art Directors that enhance image and video generation.
We explore how LaDi integrates multiple techniques for augmenting the capabilities of text-to-image generators (T2Is) and text-to-video generators (T2Vs)
arXiv Detail & Related papers (2023-11-07T04:44:40Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Learning Universal Policies via Text-Guided Video Generation [179.6347119101618]
A goal of artificial intelligence is to construct an agent that can solve a wide variety of tasks.
Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images.
We investigate whether such tools can be used to construct more general-purpose agents.
arXiv Detail & Related papers (2023-01-31T21:28:13Z) - Vision-Language Pre-training: Basics, Recent Advances, and Future Trends [158.34830433299268]
Vision-language pre-training methods for multimodal intelligence have been developed in the last few years.
For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced.
In addition, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.
arXiv Detail & Related papers (2022-10-17T17:11:36Z) - Visualize Before You Write: Imagination-Guided Open-Ended Text
Generation [68.96699389728964]
We propose iNLG that uses machine-generated images to guide language models in open-ended text generation.
Experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks.
arXiv Detail & Related papers (2022-10-07T18:01:09Z) - A Taxonomy of Prompt Modifiers for Text-To-Image Generation [6.903929927172919]
This paper identifies six types of prompt modifier used by practitioners in the online community based on a 3-month ethnography study.
The novel taxonomy of prompt modifier provides researchers a conceptual starting point for investigating the practice of text-to-image generation.
We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction.
arXiv Detail & Related papers (2022-04-20T06:15:50Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z) - A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language.
Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text.
To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z) - TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary
Generator [34.7504057664375]
We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video.
Step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions.
arXiv Detail & Related papers (2020-09-04T06:33:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.