From Sora What We Can See: A Survey of Text-to-Video Generation
- URL: http://arxiv.org/abs/2405.10674v1
- Date: Fri, 17 May 2024 10:09:09 GMT
- Title: From Sora What We Can See: A Survey of Text-to-Video Generation
- Authors: Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv Ranjan,
- Abstract summary: Sora, developed by OpenAI, is capable of minute-level world-simulative abilities.
Despite its notable successes, Sora still encounters various obstacles that need to be resolved.
- Score: 10.204414498390575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.
Related papers
- Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey [49.29751866761522]
This paper aims to investigate the intersection of GenAI and SAR.
First, we illustrate the common data generation-based applications in SAR field.
Then, an overview of the latest GenAI models is systematically reviewed.
Finally, the corresponding applications in SAR domain are also included.
arXiv Detail & Related papers (2024-11-05T03:06:00Z) - What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored.
In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - Analysing the Public Discourse around OpenAI's Text-To-Video Model 'Sora' using Topic Modeling [0.0]
This study aims to uncover the dominant themes and narratives surrounding Sora by conducting topic modeling analysis on a corpus of 1,827 Reddit comments.
The comments were collected over a two-month period following Sora's announcement in February 2024.
The results highlight prominent narratives around Sora's potential impact on industries and employment, public sentiment and ethical concerns, creative applications, and use cases in the media and entertainment sectors.
arXiv Detail & Related papers (2024-05-30T01:55:30Z) - "Sora is Incredible and Scary": Emerging Governance Challenges of Text-to-Video Generative AI Models [1.4999444543328293]
We report a qualitative social media analysis aiming to uncover people's perceived impact of and concerns about Sora's integration.
We found that people were most concerned about Sora's impact on content creation-related industries.
Potential regulatory solutions included law-enforced labeling of AI content and AI literacy education for the public.
arXiv Detail & Related papers (2024-04-10T02:03:59Z) - Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation [30.245348014602577]
We discuss the evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora.
Our review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation.
We conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community.
arXiv Detail & Related papers (2024-03-08T07:58:13Z) - Sora OpenAI's Prelude: Social Media Perspectives on Sora OpenAI and the Future of AI Video Generation [30.556463355261695]
This study investigates the public's perception of Sora OpenAI, a pioneering Gen-AI video generation tool, via social media discussions on Reddit.
The analysis forecasts positive shifts in content creation, predicting that Sora will democratize video marketing and innovate game development.
There are concerns about deepfakes and the potential for disinformation, underscoring the need for strategies to address disinformation and bias.
arXiv Detail & Related papers (2024-03-02T00:16:22Z) - Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models [59.54172719450617]
Sora is a text-to-video generative AI model, released by OpenAI in February 2024.
This paper presents a review of the model's background, related technologies, applications, remaining challenges, and future directions.
arXiv Detail & Related papers (2024-02-27T03:30:58Z) - Incremental 3D Scene Completion for Safe and Efficient Exploration
Mapping and Planning [60.599223456298915]
We propose a novel way to integrate deep learning into exploration by leveraging 3D scene completion for informed, safe, and interpretable mapping and planning.
We show that our method can speed up coverage of an environment by 73% compared to the baselines with only minimal reduction in map accuracy.
Even if scene completions are not included in the final map, we show that they can be used to guide the robot to choose more informative paths, speeding up the measurement of the scene with the robot's sensors by 35%.
arXiv Detail & Related papers (2022-08-17T14:19:33Z) - What Is Considered Complete for Visual Recognition? [110.43159801737222]
We advocate for a new type of pre-training task named learning-by-compression.
The computational models are optimized to represent the visual data using compact features.
Semantic annotations, when available, play the role of weak supervision.
arXiv Detail & Related papers (2021-05-28T16:59:14Z) - An Exploration of Embodied Visual Exploration [97.21890864063872]
Embodied computer vision considers perception for robots in novel, unstructured environments.
We present a taxonomy for existing visual exploration algorithms and create a standard framework for benchmarking them.
We then perform a thorough empirical study of the four state-of-the-art paradigms using the proposed framework.
arXiv Detail & Related papers (2020-01-07T17:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.