The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods
- URL: http://arxiv.org/abs/2411.10546v1
- Date: Fri, 15 Nov 2024 19:43:24 GMT
- Title: The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods
- Authors: Yifu Tao, Miguel Ángel Muñoz-Bañón, Lintong Zhang, Jiahao Wang, Lanke Frank Tarimo Fu, Maurice Fallon,
- Abstract summary: This paper introduces a large-scale multi-modal dataset captured in and around well-known landmarks in Oxford.
We also establish benchmarks for tasks involving localisation, reconstruction, and novel-view synthesis.
Our dataset and benchmarks are intended to facilitate better integration of radiance field methods and SLAM systems.
- Score: 10.265865092323041
- License:
- Abstract: This paper introduces a large-scale multi-modal dataset captured in and around well-known landmarks in Oxford using a custom-built multi-sensor perception unit as well as a millimetre-accurate map from a Terrestrial LiDAR Scanner (TLS). The perception unit includes three synchronised global shutter colour cameras, an automotive 3D LiDAR scanner, and an inertial sensor - all precisely calibrated. We also establish benchmarks for tasks involving localisation, reconstruction, and novel-view synthesis, which enable the evaluation of Simultaneous Localisation and Mapping (SLAM) methods, Structure-from-Motion (SfM) and Multi-view Stereo (MVS) methods as well as radiance field methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting. To evaluate 3D reconstruction the TLS 3D models are used as ground truth. Localisation ground truth is computed by registering the mobile LiDAR scans to the TLS 3D models. Radiance field methods are evaluated not only with poses sampled from the input trajectory, but also from viewpoints that are from trajectories which are distant from the training poses. Our evaluation demonstrates a key limitation of state-of-the-art radiance field methods: we show that they tend to overfit to the training poses/images and do not generalise well to out-of-sequence poses. They also underperform in 3D reconstruction compared to MVS systems using the same visual inputs. Our dataset and benchmarks are intended to facilitate better integration of radiance field methods and SLAM systems. The raw and processed data, along with software for parsing and evaluation, can be accessed at https://dynamic.robots.ox.ac.uk/datasets/oxford-spires/.
Related papers
- VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - Neural Rendering based Urban Scene Reconstruction for Autonomous Driving [8.007494499012624]
We propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields.
Dense 3D reconstruction has many applications in automated driving including automated annotation validation.
We demonstrate qualitative and quantitative results on challenging automotive scenes.
arXiv Detail & Related papers (2024-02-09T23:20:23Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations.
We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions.
We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - High-level camera-LiDAR fusion for 3D object detection with machine
learning [0.0]
This paper tackles the 3D object detection problem, which is of vital importance for applications such as autonomous driving.
It uses a Machine Learning pipeline on a combination of monocular camera and LiDAR data to detect vehicles in the surrounding 3D space of a moving platform.
Our results demonstrate an efficient and accurate inference on a validation set, achieving an overall accuracy of 87.1%.
arXiv Detail & Related papers (2021-05-24T01:57:34Z) - It's All Around You: Range-Guided Cylindrical Network for 3D Object
Detection [4.518012967046983]
This work presents a novel approach for analyzing 3D data produced by 360-degree depth scanners.
We introduce a novel notion of range-guided convolutions, adapting the receptive field by distance from the ego vehicle and the object's scale.
Our network demonstrates powerful results on the nuScenes challenge, comparable to current state-of-the-art architectures.
arXiv Detail & Related papers (2020-12-05T21:02:18Z) - Characterization of Multiple 3D LiDARs for Localization and Mapping
using Normal Distributions Transform [54.46473014276162]
We present a detailed comparison of ten different 3D LiDAR sensors, covering a range of manufacturers, models, and laser configurations, for the tasks of mapping and vehicle localization.
Data used in this study is a subset of our LiDAR Benchmarking and Reference (LIBRE) dataset, captured independently from each sensor, from a vehicle driven on public urban roads multiple times, at different times of the day.
We analyze the performance and characteristics of each LiDAR for the tasks of (1) 3D mapping including an assessment map quality based on mean map entropy, and (2) 6-DOF localization using a ground truth reference map.
arXiv Detail & Related papers (2020-04-03T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.