Related papers: Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation

Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation

URL: http://arxiv.org/abs/2503.09464v1
Date: Wed, 12 Mar 2025 15:18:50 GMT
Title: Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation
Authors: Máté Tóth, Péter Kovács, Zoltán Bendefy, Zoltán Hortsin, Balázs Teréki, Tamás Matuszka,
Abstract summary: We introduce a hybrid approach that combines the strengths of neural reconstruction with physics-based rendering.<n>Our approach significantly enhances novel view synthesis quality, especially for road surfaces and lane markings.<n>We achieve this by training a customized NeRF model on the original images with depth regularization derived from a noisy LiDAR point cloud.
Score: 1.0027737736304287
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neural reconstruction models for autonomous driving simulation have made significant strides in recent years, with dynamic models becoming increasingly prevalent. However, these models are typically limited to handling in-domain objects closely following their original trajectories. We introduce a hybrid approach that combines the strengths of neural reconstruction with physics-based rendering. This method enables the virtual placement of traditional mesh-based dynamic agents at arbitrary locations, adjustments to environmental conditions, and rendering from novel camera viewpoints. Our approach significantly enhances novel view synthesis quality -- especially for road surfaces and lane markings -- while maintaining interactive frame rates through our novel training method, NeRF2GS. This technique leverages the superior generalization capabilities of NeRF-based methods and the real-time rendering speed of 3D Gaussian Splatting (3DGS). We achieve this by training a customized NeRF model on the original images with depth regularization derived from a noisy LiDAR point cloud, then using it as a teacher model for 3DGS training. This process ensures accurate depth, surface normals, and camera appearance modeling as supervision. With our block-based training parallelization, the method can handle large-scale reconstructions (greater than or equal to 100,000 square meters) and predict segmentation masks, surface normals, and depth maps. During simulation, it supports a rasterization-based rendering backend with depth-based composition and multiple camera models for real-time camera simulation, as well as a ray-traced backend for precise LiDAR simulation.

Related papers

ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments [22.371417505012566]
This work focuses on modeling dynamic urban environments for autonomous driving simulation.<n>We propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement.<n>This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained photorealistic changes of background and objects.
arXiv Detail & Related papers (2025-07-05T03:54:40Z)
DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision. We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects. Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars [30.56664313203195]
We present Reduced Gaussian Blendshapes Avatar (RGBAvatar), a method for reconstructing, animatable head avatars at speeds sufficient for on-the-fly reconstruction. Our method maps tracked 3DMM parameters into reduced blendshape weights with an composition, leading to a compact set of blendshape bases. We propose a local-global sampling strategy that enables direct on-the-fly reconstruction, immediately reconstructing the images as video streams in real time while achieving quality comparable to offline settings.
arXiv Detail & Related papers (2025-03-17T07:31:21Z)
OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving [12.47557991785691]
We propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images.<n>Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects.<n>Experiments on the Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS
arXiv Detail & Related papers (2025-02-20T04:00:47Z)
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian [78.74634059559891]
EnvGS is a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments.<n>To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based reflection that leverages the GPU's RT core for fast rendering.<n>Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections.
arXiv Detail & Related papers (2024-12-19T18:59:57Z)
GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction [79.42244344704154]
GausSurf employs geometry guidance from multi-view consistency in texture-rich areas and normal priors in texture-less areas of a scene.<n>Our method surpasses state-of-the-art methods in terms of reconstruction quality and computation time.
arXiv Detail & Related papers (2024-11-29T03:54:54Z)
TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene [25.164085646259856]
This paper introduces a template-free 3D semantic NeRF for dynamic scenes captured from sparse or singleview RGB videos.<n>By disentangling the motions of interacting entities and optimizing per-entity skinning weights, our method efficiently generates accurate, semantically separable geometries.
arXiv Detail & Related papers (2024-09-26T01:34:42Z)
S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation [21.501865765631123]
S-NeRF++ is an innovative autonomous driving simulation system based on neural reconstruction. S-NeRF++ is trained on widely-used self-driving datasets such as nuScenes and radiance. System effectively utilizes noisy and sparse LiDAR data to refine training and address depth outliers.
arXiv Detail & Related papers (2024-02-03T10:35:42Z)
SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes [75.9110646062442]
We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
arXiv Detail & Related papers (2023-08-16T09:50:35Z)
Neural Lens Modeling [50.57409162437732]
NeuroLens is a neural lens model for distortion and vignetting that can be used for point projection and ray casting. It can be used to perform pre-capture calibration using classical calibration targets, and can later be used to perform calibration or refinement during 3D reconstruction. The model generalizes across many lens types and is trivial to integrate into existing 3D reconstruction and rendering systems.
arXiv Detail & Related papers (2023-04-10T20:09:17Z)
SMPLpix: Neural Avatars from 3D Human Models [56.85115800735619]
We bridge the gap between classic rendering and the latest generative networks operating in pixel space. We train a network that directly converts a sparse set of 3D mesh vertices into photorealistic images. We show the advantage over conventional differentiables both in terms of the level of photorealism and rendering efficiency.
arXiv Detail & Related papers (2020-08-16T10:22:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.