Related papers: ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation

ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation

URL: http://arxiv.org/abs/2510.18341v1
Date: Tue, 21 Oct 2025 06:50:20 GMT
Title: ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation
Authors: Kaiyuan Tan, Yingying Shen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye,
Abstract summary: This report presents our winning solution which took first place in the RealADSim Workshop NVS track at ICCV 2025.<n>To address the core challenges of street view extrapolation, we introduce a comprehensive four-stage pipeline.<n>On the RealADSim-NVS benchmark, our method achieves a final score of 0.441, ranking first among all participants.
Score: 8.962361530943976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realistic view extrapolation is critical for closed-loop simulation in autonomous driving, yet it remains a significant challenge for current Novel View Synthesis (NVS) methods, which often produce distorted and inconsistent images beyond the original trajectory. This report presents our winning solution which ctook first place in the RealADSim Workshop NVS track at ICCV 2025. To address the core challenges of street view extrapolation, we introduce a comprehensive four-stage pipeline. First, we employ a data-driven initialization strategy to generate a robust pseudo-LiDAR point cloud, avoiding local minima. Second, we inject strong geometric priors by modeling the road surface with a novel dimension-reduced SDF termed 2D-SDF. Third, we leverage a generative prior to create pseudo ground truth for extrapolated viewpoints, providing auxilary supervision. Finally, a data-driven adaptation network removes time-specific artifacts. On the RealADSim-NVS benchmark, our method achieves a final score of 0.441, ranking first among all participants.

Related papers

Hybrid Gaussian Splatting for Novel Urban View Synthesis [9.298287928508492]
This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025.<n>The challenge concerns novel view synthesis in street scenes, and participants are required to generate renders of the same urban environment.<n>Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models.<n>On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.
arXiv Detail & Related papers (2025-10-14T09:09:13Z)
Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning [0.0]
We present the Efficient Virtuoso, a conditional latent diffusion model for goal-conditioned trajectory planning.<n>We demonstrate that our method achieves state-of-the-art performance on the Open Motion dataset, achieving a minimum Average Displacement Error (minADE) of 0.25.<n>We provide a key insight: while a single goal can resolve strategic ambiguity, a richer, multi-step sparse route is essential for enabling the precise, high-fidelity tactical execution that mirrors nuanced human driving behavior.
arXiv Detail & Related papers (2025-09-03T19:18:02Z)
Extrapolated Urban View Synthesis Benchmark [53.657271730352214]
Photo simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs)<n>At their core is Novel View Synthesis (NVS), a capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs.<n>Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes.<n>We will release the data to help advance self-driving and urban robotics simulation technology.
arXiv Detail & Related papers (2024-12-06T18:41:39Z)
Deep Loss Convexification for Learning Iterative Models [11.36644967267829]
Iterative methods such as iterative closest point (ICP) for point cloud registration often suffer from bad local optimality. We propose learning to form a convex landscape around each ground truth.
arXiv Detail & Related papers (2024-11-16T01:13:04Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction [56.72301849123049]
We present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ dataset challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth.
arXiv Detail & Related papers (2024-07-01T16:32:15Z)
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis [84.23233209017192]
This paper presents a synthetic dataset for novel driving view synthesis evaluation.<n>It includes testing images captured by deviating from the training trajectory by $1-4$ meters.<n>We establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multicamera settings.
arXiv Detail & Related papers (2024-06-26T14:00:21Z)
ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap [4.216707699421813]
ViiNeuS is a new hybrid implicit surface learning method that efficiently initializes the signed distance field.<n>We show that ViiNeuS can learn an accurate and detailed 3D surface representation of various urban scene while being two times faster to train.
arXiv Detail & Related papers (2024-03-15T14:31:17Z)
StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views [6.35910814268525]
We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf. It is readily applicable to street view images in widely-used autonomous driving datasets, without necessarily requiring LiDAR data. We achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time.
arXiv Detail & Related papers (2023-06-08T07:19:27Z)
NeurAR: Neural Uncertainty for Autonomous 3D Reconstruction [64.36535692191343]
Implicit neural representations have shown compelling results in offline 3D reconstruction and also recently demonstrated the potential for online SLAM systems. This paper addresses two key challenges: 1) seeking a criterion to measure the quality of the candidate viewpoints for the view planning based on the new representations, and 2) learning the criterion from data that can generalize to different scenes instead of hand-crafting one. Our method demonstrates significant improvements on various metrics for the rendered image quality and the geometry quality of the reconstructed 3D models when compared with variants using TSDF or reconstruction without view planning.
arXiv Detail & Related papers (2022-07-22T10:05:36Z)
Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [55.7340077183072]
We tackle the task of estimating the 6D pose of an object from point cloud data. Recent learning-based approaches to addressing this task have shown great success on synthetic datasets. We analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds.
arXiv Detail & Related papers (2022-03-29T07:55:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.