Sora Generates Videos with Stunning Geometrical Consistency
- URL: http://arxiv.org/abs/2402.17403v1
- Date: Tue, 27 Feb 2024 10:49:05 GMT
- Title: Sora Generates Videos with Stunning Geometrical Consistency
- Authors: Xuanyi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou and
Ming-Ming Cheng
- Abstract summary: We introduce a new benchmark that assesses the quality of the generated videos based on their adherence to real-world physics principles.
We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality.
- Score: 75.46675626542837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently developed Sora model [1] has exhibited remarkable capabilities
in video generation, sparking intense discussions regarding its ability to
simulate real-world phenomena. Despite its growing popularity, there is a lack
of established metrics to evaluate its fidelity to real-world physics
quantitatively. In this paper, we introduce a new benchmark that assesses the
quality of the generated videos based on their adherence to real-world physics
principles. We employ a method that transforms the generated videos into 3D
models, leveraging the premise that the accuracy of 3D reconstruction is
heavily contingent on the video quality. From the perspective of 3D
reconstruction, we use the fidelity of the geometric constraints satisfied by
the constructed 3D models as a proxy to gauge the extent to which the generated
videos conform to real-world physics rules. Project page:
https://sora-geometrical-consistency.github.io/
Related papers
- ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [16.14713604672497]
ReconX is a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task.
The proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition.
Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency.
arXiv Detail & Related papers (2024-08-29T17:59:40Z) - What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored.
In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities.
We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models.
Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z) - Precise-Physics Driven Text-to-3D Generation [24.180947937863355]
We propose Phy3DGen, a precise-physics-driven text-to-3D generation method.
By analyzing the solid mechanics of generated 3D shapes, we reveal that the 3D shapes generated by existing text-to-3D generation methods are impractical for real-world applications.
arXiv Detail & Related papers (2024-03-19T04:51:38Z) - Towards Live 3D Reconstruction from Wearable Video: An Evaluation of
V-SLAM, NeRF, and Videogrammetry Techniques [20.514826446476267]
Mixed reality (MR) is a key technology which promises to change the future of warfare.
To enable this technology, a large-scale 3D model of a physical environment must be maintained based on live sensor observations.
We survey several 3D reconstruction algorithms for large-scale mapping for military applications given only live video.
arXiv Detail & Related papers (2022-11-21T19:57:51Z) - 3D-Aware Video Generation [149.5230191060692]
We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
arXiv Detail & Related papers (2022-06-29T17:56:03Z) - LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video.
Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z) - Online Adaptation for Consistent Mesh Reconstruction in the Wild [147.22708151409765]
We pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video.
We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild.
arXiv Detail & Related papers (2020-12-06T07:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.