Related papers: Sora Generates Videos with Stunning Geometrical Consistency

Sora Generates Videos with Stunning Geometrical Consistency

URL: http://arxiv.org/abs/2402.17403v1
Date: Tue, 27 Feb 2024 10:49:05 GMT
Title: Sora Generates Videos with Stunning Geometrical Consistency
Authors: Xuanyi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou and Ming-Ming Cheng
Abstract summary: We introduce a new benchmark that assesses the quality of the generated videos based on their adherence to real-world physics principles. We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality.
Score: 75.46675626542837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there is a lack of established metrics to evaluate its fidelity to real-world physics quantitatively. In this paper, we introduce a new benchmark that assesses the quality of the generated videos based on their adherence to real-world physics principles. We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality. From the perspective of 3D reconstruction, we use the fidelity of the geometric constraints satisfied by the constructed 3D models as a proxy to gauge the extent to which the generated videos conform to real-world physics rules. Project page: https://sora-geometrical-consistency.github.io/

Related papers

ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction [22.420752010237052]
We introduce ReVision, a plug-and-play framework that explicitly integrates parameterized 3D physical knowledge into a conditional video generation model. We validate the effectiveness of our approach on Stable Video Diffusion, where ReVision significantly improves motion fidelity and coherence. Our results suggest that, by incorporating 3D physical knowledge, even a relatively small video diffusion model can generate complex motions and interactions with greater realism and controllability.
arXiv Detail & Related papers (2025-04-30T17:59:56Z)
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach [42.581066866708085]
We present a novel video generation framework that integrates 3-dimensional geometry and dynamic awareness. To achieve this, we augment 2D videos with 3D point trajectories and align them in pixel space. The resulting 3D-aware video dataset, PointVid, is then used to fine-tune a latent diffusion model. We regularize the shape and motion of objects in the video to eliminate undesired artifacts.
arXiv Detail & Related papers (2025-02-05T21:49:06Z)
Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation [47.6666060652434]
We present an innovative framework that generates 3D models with accurate appearances and geometric structures. By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects.
arXiv Detail & Related papers (2024-12-07T06:48:16Z)
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [16.14713604672497]
ReconX is a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency.
arXiv Detail & Related papers (2024-08-29T17:59:40Z)
What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored. In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z)
VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities. We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models. Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z)
Precise-Physics Driven Text-to-3D Generation [24.180947937863355]
We propose Phy3DGen, a precise-physics-driven text-to-3D generation method. By analyzing the solid mechanics of generated 3D shapes, we reveal that the 3D shapes generated by existing text-to-3D generation methods are impractical for real-world applications.
arXiv Detail & Related papers (2024-03-19T04:51:38Z)
Towards Live 3D Reconstruction from Wearable Video: An Evaluation of V-SLAM, NeRF, and Videogrammetry Techniques [20.514826446476267]
Mixed reality (MR) is a key technology which promises to change the future of warfare. To enable this technology, a large-scale 3D model of a physical environment must be maintained based on live sensor observations. We survey several 3D reconstruction algorithms for large-scale mapping for military applications given only live video.
arXiv Detail & Related papers (2022-11-21T19:57:51Z)
3D-Aware Video Generation [149.5230191060692]
We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
arXiv Detail & Related papers (2022-06-29T17:56:03Z)
LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video. Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z)
Online Adaptation for Consistent Mesh Reconstruction in the Wild [147.22708151409765]
We pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video. We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild.
arXiv Detail & Related papers (2020-12-06T07:22:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.