Related papers: ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving

ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving

URL: http://arxiv.org/abs/2508.13977v2
Date: Tue, 16 Sep 2025 04:19:14 GMT
Title: ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving
Authors: Xianda Guo, Ruijun Zhang, Yiqun Duan, Ruilin Wang, Matteo Poggi, Keyuan Zhou, Wenzhao Zheng, Wenke Huang, Gangwei Xu, Mike Horton, Yuan Si, Qin Zou, Hao Zhao, Long Chen,
Abstract summary: We present ROVR, a large-scale, diverse, and cost-efficient depth dataset designed to capture the complexity of real-world driving.<n>A lightweight acquisition pipeline ensures scalable collection, while sparse but statistically sufficient ground truth supports robust training.<n> Benchmarking with state-of-the-art monocular depth models reveals severe cross-dataset generalization failures.
Score: 62.9051914830949
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth estimation is a fundamental task for 3D scene understanding in autonomous driving, robotics, and augmented reality. Existing depth datasets, such as KITTI, nuScenes, and DDAD, have advanced the field but suffer from limitations in diversity and scalability. As benchmark performance on these datasets approaches saturation, there is an increasing need for a new generation of large-scale, diverse, and cost-efficient datasets to support the era of foundation models and multi-modal learning. We present ROVR, a large-scale, diverse, and cost-efficient depth dataset designed to capture the complexity of real-world driving. ROVR comprises 200K high-resolution frames across highway, rural, and urban scenarios, spanning day/night and adverse weather conditions. A lightweight acquisition pipeline ensures scalable collection, while sparse but statistically sufficient ground truth supports robust training. Benchmarking with state-of-the-art monocular depth models reveals severe cross-dataset generalization failures: models achieving near-ceiling accuracy on KITTI degrade drastically on ROVR, and even when trained on ROVR, current methods fall short of saturation. These results highlight the unique challenges posed by ROVR-scene diversity, dynamic environments, and sparse ground truth, establishing it as a demanding new platform for advancing depth estimation and building models with stronger real-world robustness. Extensive ablation studies provide a more intuitive understanding of our dataset across different scenarios, lighting conditions, and generalized ability.

Related papers

Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization [80.55849461031879]
UnLoc is an efficient data-driven solution for sequential camera localization within floorplans.<n>We introduce a novel probabilistic model that incorporates uncertainty estimation, modeling depth predictions as explicit probability distributions.<n>We evaluate UnLoc on large-scale synthetic and real-world datasets, demonstrating significant improvements in terms of accuracy and robustness.
arXiv Detail & Related papers (2025-09-14T14:45:43Z)
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z)
Depth as Points: Center Point-based Depth Estimation [25.930620717806914]
We develop a method for creating task- and scenario-specific datasets in a short time.<n>We construct the virtual depth estimation dataset VirDepth, a large-scale, multi-task autonomous driving dataset.<n>We also propose CenterDepth, a lightweight architecture for monocular depth estimation.
arXiv Detail & Related papers (2025-04-26T03:04:05Z)
PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes.<n>We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z)
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding [1.0445560141983634]
We propose a novel image-based semantic embedding that extracts contextual information directly from visual features.<n>Our method achieves performance comparable to state-of-the-art models while addressing the shortcomings of CLIP embeddings in handling outdoor scenes.
arXiv Detail & Related papers (2025-02-01T15:37:22Z)
Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data [13.107135855680992]
We propose a real-time omnidirectional depth estimation method for edge computing platforms named Rt- OmniMVS.<n>To achieve high accuracy, robustness, and generalization in real-world environments, we introduce a teacher-student learning strategy.<n>We also propose HexaMODE, an omnidirectional depth sensing system based on multi-view fisheye cameras and edge device.
arXiv Detail & Related papers (2024-09-12T08:44:35Z)
UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised [12.440461420762265]
Road segmentation is a critical task for autonomous driving systems. Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps. One of the primary challenges is the scarcity of large-scale, accurately labeled datasets.
arXiv Detail & Related papers (2024-09-10T03:57:30Z)
PLT-D3: A High-fidelity Dynamic Driving Simulation Dataset for Stereo Depth and Scene Flow [0.0]
This paper introduces Dynamic-weather Driving dataset; a high-fidelity stereo depth and scene flow ground truth data generated using Engine 5. In particular, this dataset includes synchronized high-resolution stereo image sequences that replicate a wide array of dynamic weather scenarios. Benchmarks have been established for several critical autonomous driving tasks using Unreal-D3 to measure and enhance the performance of state-of-the-art models.
arXiv Detail & Related papers (2024-06-11T19:21:46Z)
DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge [54.71866583204417]
In this report, we introduce the DINO-SD, a novel surround-view depth estimation model. Our DINO-SD does not need additional data and has strong robustness. Our DINO-SD get the best performance in the track4 of ICRA 2024 RoboDepth Challenge.
arXiv Detail & Related papers (2024-05-27T12:21:31Z)
RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving [67.09546127265034]
Road surface reconstruction helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems. We introduce the Road Surface Reconstruction dataset, a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions. It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps.
arXiv Detail & Related papers (2023-10-03T17:59:32Z)
LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning. However, the promising results achieved on current public datasets may not be applicable to practical scenarios. We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z)
RELLIS-3D Dataset: Data, Benchmarks and Analysis [16.803548871633957]
RELLIS-3D is a multimodal dataset collected in an off-road environment. The data was collected on the Rellis Campus of Texas A&M University.
arXiv Detail & Related papers (2020-11-17T18:28:01Z)
Exploring the Impacts from Datasets to Monocular Depth Estimation (MDE) Models with MineNavi [5.689127984415125]
Current computer vision tasks based on deep learning require a huge amount of data with annotations for model training or testing. In practice, manual labeling for dense estimation tasks is very difficult or even impossible, and the scenes of the dataset are often restricted to a small range. We propose a synthetic dataset generation method to obtain the expandable dataset without burdensome manual workforce.
arXiv Detail & Related papers (2020-08-19T14:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.