StyledStreets: Multi-style Street Simulator with Spatial and Temporal Consistency
- URL: http://arxiv.org/abs/2503.21104v1
- Date: Thu, 27 Mar 2025 02:52:29 GMT
- Title: StyledStreets: Multi-style Street Simulator with Spatial and Temporal Consistency
- Authors: Yuyin Chen, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Yifei Zhan, Xianpeng Lang,
- Abstract summary: textbfStyledStreets is a multi-style street simulator that achieves instruction-driven scene editing.<n>Hybrid embedding scheme disentangles persistent scene geometry from transient style attributes.<n> unified parametric model prevents geometric drift through regularized updates.
- Score: 7.860619819904401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Urban scene reconstruction requires modeling both static infrastructure and dynamic elements while supporting diverse environmental conditions. We present \textbf{StyledStreets}, a multi-style street simulator that achieves instruction-driven scene editing with guaranteed spatial and temporal consistency. Building on a state-of-the-art Gaussian Splatting framework for street scenarios enhanced by our proposed pose optimization and multi-view training, our method enables photorealistic style transfers across seasons, weather conditions, and camera setups through three key innovations: First, a hybrid embedding scheme disentangles persistent scene geometry from transient style attributes, allowing realistic environmental edits while preserving structural integrity. Second, uncertainty-aware rendering mitigates supervision noise from diffusion priors, enabling robust training across extreme style variations. Third, a unified parametric model prevents geometric drift through regularized updates, maintaining multi-view consistency across seven vehicle-mounted cameras. Our framework preserves the original scene's motion patterns and geometric relationships. Qualitative results demonstrate plausible transitions between diverse conditions (snow, sandstorm, night), while quantitative evaluations show state-of-the-art geometric accuracy under style transfers. The approach establishes new capabilities for urban simulation, with applications in autonomous vehicle testing and augmented reality systems requiring reliable environmental consistency. Codes will be publicly available upon publication.
Related papers
- SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration [37.202523124756034]
Current approaches often falter in large-angle novel view synthesis and suffer from geometric or lighting artifacts during asset manipulation.<n>We propose SymDrive, a unified diffusion-based framework capable of joint high-quality rendering and scene editing.<n>We demonstrate that SymDrive achieves photorealistic state-of-the-art performance in both novel-view enhancement and realistic 3D vehicle insertion.
arXiv Detail & Related papers (2025-12-25T10:28:43Z) - Optimization-Guided Diffusion for Interactive Scene Generation [52.23368750264419]
We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling.<n>We show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes.<n>Our approach can also generate $5times$ more near-collision frames with a time-to-collision under three seconds.
arXiv Detail & Related papers (2025-12-08T15:56:18Z) - HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving [59.55918581964678]
HybridWorldSim is a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents.<n>We release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities.
arXiv Detail & Related papers (2025-11-27T07:53:16Z) - G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior [53.762256749551284]
We identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction.<n>We incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models.<n>Our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2025-10-14T03:06:28Z) - MagicRoad: Semantic-Aware 3D Road Surface Reconstruction via Obstacle Inpainting [4.090597563540577]
Road surface reconstruction is essential for autonomous driving, supporting centimeter-accurate lane perception and high-definition mapping in complex urban environments.<n>We present a robust reconstruction framework that integrates 2D Gaussian surfels with semantic-guided color enhancement to recover clean, consistent road surfaces.
arXiv Detail & Related papers (2025-07-31T08:38:36Z) - ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments [22.371417505012566]
This work focuses on modeling dynamic urban environments for autonomous driving simulation.<n>We propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement.<n>This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained photorealistic changes of background and objects.
arXiv Detail & Related papers (2025-07-05T03:54:40Z) - GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z) - SceneCrafter: Controllable Multi-View Driving Scene Editing [44.91248700043744]
We propose SceneCrafter, a versatile editor for realistic 3D-consistent manipulation of driving scenes captured from multiple cameras.<n>SceneCrafter achieves state-of-the-art realism, controllability, 3D consistency, and scene editing quality compared to existing baselines.
arXiv Detail & Related papers (2025-06-24T10:23:47Z) - X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability [49.4647778989539]
X-Scene is a novel framework for large-scale driving scene generation.<n>It achieves both geometric intricacy and appearance fidelity, while offering flexible controllability.<n>X-Scene significantly advances controllability and fidelity for large-scale driving scene generation.
arXiv Detail & Related papers (2025-06-16T14:43:18Z) - GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control [50.67481583744243]
We introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models.<n>We propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles.<n>Our method significantly outperforms existing models in both action accuracy and 3D spatial awareness.
arXiv Detail & Related papers (2025-05-28T14:46:51Z) - 3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction [1.2603104712715607]
This paper proposes a novel 3D Gaussian point distribution method for dynamic street scene reconstruction.<n>Our approach eliminates moving objects while preserving high-fidelity static scene details.<n> Experimental results demonstrate that our method achieves high reconstruction quality, improved rendering performance, and adaptability in large-scale dynamic environments.
arXiv Detail & Related papers (2025-03-15T05:41:59Z) - RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation [30.744548212616007]
We present the first simulation framework RoCo-Sim for road-side collaborative perception.<n>RoCo-Sim is capable of generating diverse, multi-view consistent simulated roadside data.<n>Code and pre-trained models will be released soon.
arXiv Detail & Related papers (2025-03-13T14:33:42Z) - UrbanGS: Semantic-Guided Gaussian Splatting for Urban Scene Reconstruction [86.4386398262018]
UrbanGS uses 2D semantic maps and an existing dynamic Gaussian approach to distinguish static objects from the scene.<n>For potentially dynamic objects, we aggregate temporal information using learnable time embeddings.<n>Our approach outperforms state-of-the-art methods in reconstruction quality and efficiency.
arXiv Detail & Related papers (2024-12-04T16:59:49Z) - OmniRe: Omni Urban Scene Reconstruction [78.99262488964423]
We introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs.
We propose a comprehensive 3DGS framework for driving scenes, named OmniRe, that allows for accurate, full-length reconstruction of diverse dynamic objects in a driving log.
arXiv Detail & Related papers (2024-08-29T17:56:33Z) - AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [17.600027937450342]
AutoSplat is a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes.
Our method enables multi-view consistent simulation of challenging scenarios including lane changes.
arXiv Detail & Related papers (2024-07-02T18:36:50Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Modeling Ambient Scene Dynamics for Free-view Synthesis [31.233859111566613]
We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture.
Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes.
arXiv Detail & Related papers (2024-06-13T17:59:11Z) - ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap [4.216707699421813]
ViiNeuS is a new hybrid implicit surface learning method that efficiently initializes the signed distance field.<n>We show that ViiNeuS can learn an accurate and detailed 3D surface representation of various urban scene while being two times faster to train.
arXiv Detail & Related papers (2024-03-15T14:31:17Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering [49.36767999382054]
We present a unified representation model, called Periodic Vibration Gaussian (PVG)<n>PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation.<n>PVG exhibits 900-fold acceleration in rendering over the best alternative.
arXiv Detail & Related papers (2023-11-30T13:53:50Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance.
We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.