Sora Generates Videos with Stunning Geometrical Consistency
- URL: http://arxiv.org/abs/2402.17403v1
- Date: Tue, 27 Feb 2024 10:49:05 GMT
- Title: Sora Generates Videos with Stunning Geometrical Consistency
- Authors: Xuanyi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou and
Ming-Ming Cheng
- Abstract summary: We introduce a new benchmark that assesses the quality of the generated videos based on their adherence to real-world physics principles.
We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality.
- Score: 75.46675626542837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently developed Sora model [1] has exhibited remarkable capabilities
in video generation, sparking intense discussions regarding its ability to
simulate real-world phenomena. Despite its growing popularity, there is a lack
of established metrics to evaluate its fidelity to real-world physics
quantitatively. In this paper, we introduce a new benchmark that assesses the
quality of the generated videos based on their adherence to real-world physics
principles. We employ a method that transforms the generated videos into 3D
models, leveraging the premise that the accuracy of 3D reconstruction is
heavily contingent on the video quality. From the perspective of 3D
reconstruction, we use the fidelity of the geometric constraints satisfied by
the constructed 3D models as a proxy to gauge the extent to which the generated
videos conform to real-world physics rules. Project page:
https://sora-geometrical-consistency.github.io/
Related papers
- 3DSPA: A 3D Semantic Point Autoencoder for Evaluating Video Realism [2.6197884751430327]
We develop an automated evaluation framework for video realism which captures both semantics and coherent 3D structure.<n>Our method, 3DSPA, is 3Dtemporal point autoencoder which integrates 3D point trajectories, depth cues, and DINO semantic features into a unified representation for video evaluation.<n> Experiments show that 3DSPA reliably identifies videos which violate physical laws, is more sensitive to motion artifacts, and aligns more closely with human judgments of video quality and realism.
arXiv Detail & Related papers (2026-02-23T21:00:48Z) - Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency [23.121660279216528]
Grab-3D is a geometry-aware transformer framework for detecting AI-generated videos based on 3D geometric temporal consistency.<n>We propose a geometry-aware transformer equipped with geometric positional encoding, temporal-geometric attention, and an EMA-based geometric head to explicitly inject 3D geometric awareness into temporal modeling.
arXiv Detail & Related papers (2025-12-15T18:54:30Z) - ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation [62.86900540547787]
Current 3D avatar generation methods often suffer from artifacts such as blurry textures and stiff, unnatural motion.<n>We propose a novel approach that combines the strengths of both paradigms.<n>By uniting the geometric stability of 3D reconstruction with the generative capabilities of video models, our method produces high-fidelity digital avatars.
arXiv Detail & Related papers (2025-12-08T17:10:29Z) - GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation [68.02988074681427]
Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content.<n>In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models.<n>Our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2025-11-28T13:55:45Z) - FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction [13.098585993121722]
We present FantasyWorld, a geometry-enhanced framework that augments frozen video foundation models with a trainable geometric branch.<n>Our approach introduces cross-branch supervision, where geometry cues guide video generation and video priors regularize 3D prediction.<n>Experiments show that FantasyWorld effectively bridges video imagination and 3D perception, outperforming recent geometry-consistent baselines in multi-view coherence and style consistency.
arXiv Detail & Related papers (2025-09-25T22:24:23Z) - Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video [56.781766315691854]
We introduce textbfRestage4D, a geometry-preserving pipeline for video-conditioned 4D restaging.<n>We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance.
arXiv Detail & Related papers (2025-08-08T21:31:51Z) - ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction [22.420752010237052]
We introduce ReVision, a plug-and-play framework that explicitly integrates parameterized 3D physical knowledge into a conditional video generation model.
We validate the effectiveness of our approach on Stable Video Diffusion, where ReVision significantly improves motion fidelity and coherence.
Our results suggest that, by incorporating 3D physical knowledge, even a relatively small video diffusion model can generate complex motions and interactions with greater realism and controllability.
arXiv Detail & Related papers (2025-04-30T17:59:56Z) - Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach [42.581066866708085]
We present a novel video generation framework that integrates 3-dimensional geometry and dynamic awareness.
To achieve this, we augment 2D videos with 3D point trajectories and align them in pixel space.
The resulting 3D-aware video dataset, PointVid, is then used to fine-tune a latent diffusion model.
We regularize the shape and motion of objects in the video to eliminate undesired artifacts.
arXiv Detail & Related papers (2025-02-05T21:49:06Z) - Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation [47.6666060652434]
We present an innovative framework that generates 3D models with accurate appearances and geometric structures.
By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects.
arXiv Detail & Related papers (2024-12-07T06:48:16Z) - ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [16.14713604672497]
ReconX is a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task.
The proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition.
Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency.
arXiv Detail & Related papers (2024-08-29T17:59:40Z) - What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored.
In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities.
We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models.
Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z) - Precise-Physics Driven Text-to-3D Generation [24.180947937863355]
We propose Phy3DGen, a precise-physics-driven text-to-3D generation method.
By analyzing the solid mechanics of generated 3D shapes, we reveal that the 3D shapes generated by existing text-to-3D generation methods are impractical for real-world applications.
arXiv Detail & Related papers (2024-03-19T04:51:38Z) - Towards Live 3D Reconstruction from Wearable Video: An Evaluation of
V-SLAM, NeRF, and Videogrammetry Techniques [20.514826446476267]
Mixed reality (MR) is a key technology which promises to change the future of warfare.
To enable this technology, a large-scale 3D model of a physical environment must be maintained based on live sensor observations.
We survey several 3D reconstruction algorithms for large-scale mapping for military applications given only live video.
arXiv Detail & Related papers (2022-11-21T19:57:51Z) - 3D-Aware Video Generation [149.5230191060692]
We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
arXiv Detail & Related papers (2022-06-29T17:56:03Z) - LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video.
Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z) - Online Adaptation for Consistent Mesh Reconstruction in the Wild [147.22708151409765]
We pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video.
We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild.
arXiv Detail & Related papers (2020-12-06T07:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.