DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer
- URL: http://arxiv.org/abs/2602.24096v2
- Date: Thu, 05 Mar 2026 10:14:27 GMT
- Title: DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer
- Authors: Yuxuan Zhang, Katarína Tóthová, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic,
- Abstract summary: We introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings into temporally consistent outputs.<n>At its core is a single-step temporally-conditioned enhancer capable of running in online simulators on a single GPU.
- Score: 62.18680935878919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Simulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction is emerging as a promising solution as it enables simulating a wide variety of scenarios from real-world data alone in an automated and scalable way. However, while methods such as NeRF and 3D Gaussian Splatting can produce visually compelling results, they often exhibit artifacts particularly when rendering novel views, and fail to realistically integrate inserted dynamic objects, especially when they were captured from different scenes. To overcome these limitations, we introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. At its core is a single-step temporally-conditioned enhancer that is converted from a pretrained multi-step image diffusion model, capable of running in online simulators on a single GPU. The key to training it effectively is a custom data curation pipeline that constructs synthetic-real pairs emphasizing appearance harmonization, artifact correction, and lighting realism. The result is a scalable system that significantly elevates simulation fidelity in both research and production environments.
Related papers
- Mirage2Matter: A Physically Grounded Gaussian World Model from Video [87.9732484393686]
We present Simulate Anything, a graphics-driven world modeling and simulation framework.<n>Our approach reconstructs real-world environments into a photorealistic scene representation using 3D Gaussian Splatting (3DGS)<n>We then leverage generative models to recover a physically realistic representation and integrate it into a simulation environment via a precision calibration target.
arXiv Detail & Related papers (2026-01-24T07:43:57Z) - ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction [26.402373173809753]
ReconDreamer-RL is a framework designed to integrate video diffusion priors into scene reconstruction to aid reinforcement learning.<n>We show that ReconDreamer-RL improves end-to-end autonomous driving training, outperforming imitation learning methods with a 5x reduction in the Collision Ratio.
arXiv Detail & Related papers (2025-08-11T16:45:55Z) - The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio [138.07247714782412]
MultiGen is a framework that integrates large-scale generative models into traditional physics simulators.<n>We demonstrate effective zero-shot transfer to real-world pouring with novel containers and liquids.
arXiv Detail & Related papers (2025-07-03T17:59:58Z) - R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation [78.26308457952636]
This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome limitations in autonomous driving simulation.<n>It enables realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time.<n>We show that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer.
arXiv Detail & Related papers (2025-06-09T14:50:19Z) - Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation [1.0027737736304287]
We introduce a hybrid approach that combines the strengths of neural reconstruction with physics-based rendering.<n>Our approach significantly enhances novel view synthesis quality, especially for road surfaces and lane markings.<n>We achieve this by training a customized NeRF model on the original images with depth regularization derived from a noisy LiDAR point cloud.
arXiv Detail & Related papers (2025-03-12T15:18:50Z) - Are NeRFs ready for autonomous driving? Towards closing the real-to-simulation gap [6.393953433174051]
We propose a novel perspective for addressing the real-to-simulated data gap.
We conduct the first large-scale investigation into the real-to-simulated data gap in an autonomous driving setting.
Our results show notable improvements in model robustness to simulated data, even improving real-world performance in some cases.
arXiv Detail & Related papers (2024-03-24T11:09:41Z) - Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs [59.12526668734703]
We introduce Composable Object Volume NeRF (COV-NeRF), an object-composable NeRF model that is the centerpiece of a real-to-sim pipeline.
COV-NeRF extracts objects from real images and composes them into new scenes, generating photorealistic renderings and many types of 2D and 3D supervision.
arXiv Detail & Related papers (2024-03-07T00:00:02Z) - RISP: Rendering-Invariant State Predictor with Differentiable Simulation
and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z) - Inferring Articulated Rigid Body Dynamics from RGBD Video [18.154013621342266]
We introduce a pipeline that combines inverse rendering with differentiable simulation to create digital twins of real-world articulated mechanisms.
Our approach accurately reconstructs the kinematic tree of an articulated mechanism being manipulated by a robot.
arXiv Detail & Related papers (2022-03-20T08:19:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.