Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes
- URL: http://arxiv.org/abs/2601.22301v1
- Date: Thu, 29 Jan 2026 20:29:04 GMT
- Title: Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes
- Authors: Gonzalo Gomez-Nogales, Yicong Hong, Chongjian Ge, Marc Comino-Trinidad, Dan Casas, Yi Zhou,
- Abstract summary: We present C2R (Coarse-to-Real), a generative framework that synthesizes real-style urban crowd videos.<n>Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories.<n>It produces temporally consistent, controllable, and realistic urban scene videos from minimal 3D input.
- Score: 22.450051108066216
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traditional rendering pipelines rely on complex assets, accurate materials and lighting, and substantial computational resources to produce realistic imagery, yet they still face challenges in scalability and realism for populated dynamic scenes. We present C2R (Coarse-to-Real), a generative rendering framework that synthesizes real-style urban crowd videos from coarse 3D simulations. Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories, while a learned neural renderer generates realistic appearance, lighting, and fine-scale dynamics guided by text prompts. To overcome the lack of paired training data between coarse simulations and real videos, we adopt a two-phase mixed CG-real training strategy that learns a strong generative prior from large-scale real footage and introduces controllability through shared implicit spatio-temporal features across domains. The resulting system supports coarse-to-fine control, generalizes across diverse CG and game inputs, and produces temporally consistent, controllable, and realistic urban scene videos from minimal 3D input. We will release the model and project webpage at https://gonzalognogales.github.io/coarse2real/.
Related papers
- Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation [87.91642226587294]
Current learning-based 3D reconstruction methods rely on the availability of captured real-world multi-view data.<n>We propose a self-distillation framework that distills the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation.<n>Our framework achieves state-of-the-art performance in static and dynamic 3D scene generation.
arXiv Detail & Related papers (2025-09-23T17:58:01Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation [78.26308457952636]
This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome limitations in autonomous driving simulation.<n>It enables realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time.<n>We show that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer.
arXiv Detail & Related papers (2025-06-09T14:50:19Z) - Learning 3D-Gaussian Simulators from RGB Videos [20.250137125726265]
3DGSim is a learned 3D simulator that learns physical interactions from multi-view RGB videos.<n>It unifies 3D scene reconstruction, particle dynamics prediction and video synthesis into an end-to-end trained framework.
arXiv Detail & Related papers (2025-03-31T12:33:59Z) - MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling [21.1274747033854]
Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes.<n>Milo is a novel framework which can synthesize character videos with controllable attributes.<n>Milo achieves advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes.
arXiv Detail & Related papers (2024-09-24T15:00:07Z) - TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components.
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z) - Learning 3D Particle-based Simulators from RGB-D Videos [15.683877597215494]
We propose a method for learning simulators directly from observations.
Visual Particle Dynamics (VPD) jointly learns a latent particle-based representation of 3D scenes.
Unlike existing 2D video prediction models, VPD's 3D structure enables scene editing and long-term predictions.
arXiv Detail & Related papers (2023-12-08T20:45:34Z) - Real-Time Neural Rasterization for Large Scenes [39.198327570559684]
We propose a new method for realistic real-time novel-view synthesis of large scenes.
Existing neural rendering methods generate realistic results, but primarily work for small scale scenes.
Our work is the first to enable real-time rendering of large real-world scenes.
arXiv Detail & Related papers (2023-11-09T18:59:10Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - SceneGen: Learning to Generate Realistic Traffic Scenes [92.98412203941912]
We present SceneGen, a neural autoregressive model of traffic scenes that eschews the need for rules and distributions.
We demonstrate SceneGen's ability to faithfully model distributions of real traffic scenes.
arXiv Detail & Related papers (2021-01-16T22:51:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.