The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization
- URL: http://arxiv.org/abs/2505.15379v1
- Date: Wed, 21 May 2025 11:16:29 GMT
- Title: The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization
- Authors: Raphael Sulzer, Liuyun Duan, Nicolas Girard, Florent Lafarge,
- Abstract summary: The P$3$ dataset is a large-scale multimodal benchmark for building vectorization.<n>The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeter.
- Score: 9.112162560071937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the P$^3$ dataset, a large-scale multimodal benchmark for building vectorization, constructed from aerial LiDAR point clouds, high-resolution aerial imagery, and vectorized 2D building outlines, collected across three continents. The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeter. While many existing datasets primarily focus on the image modality, P$^3$ offers a complementary perspective by also incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing aerial LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P$^3$ dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons .
Related papers
- From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs [64.28181017898369]
LIFT-GS predicts 3D Gaussian representations from point clouds and uses them to render predicted language-conditioned 3D masks into 2D views.<n>LIFT-GS achieves state-of-the-art results with $25.7%$ mAP on open-vocabulary instance segmentation.<n>Remarkably, pretraining effectively multiplies fine-tuning datasets by 2X, demonstrating strong scaling properties.
arXiv Detail & Related papers (2025-02-27T18:59:11Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - M$^2$-3DLaneNet: Exploring Multi-Modal 3D Lane Detection [30.250833348463633]
M$2$-3DLaneNet lifts 2D features into 3D space by incorporating geometry information from LiDAR data through depth completion.
Experiments on the large-scale OpenLane dataset demonstrate the effectiveness of M$2$-3DLaneNet, regardless of the range.
arXiv Detail & Related papers (2022-09-13T13:45:18Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection [0.0]
We propose a novel data fusion algorithm to combine accurate point clouds with dense but less accurate point clouds obtained from stereo pairs.
We train multiple 3D object detection methods and show that our fusion strategy consistently improves the performance of detectors.
arXiv Detail & Related papers (2021-11-08T19:29:59Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - Semantic Segmentation on Swiss3DCities: A Benchmark Study on Aerial
Photogrammetric 3D Pointcloud Dataset [67.44497676652173]
We introduce a new outdoor urban 3D pointcloud dataset, covering a total area of 2.7 $km2$, sampled from three Swiss cities.
The dataset is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras.
arXiv Detail & Related papers (2020-12-23T21:48:47Z) - KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions.
In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder.
Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z) - Polylidar3D -- Fast Polygon Extraction from 3D Data [0.0]
Flat surfaces captured by 3D point cloud processing are often used for localization, and modeling.
We demonstrate autonomous multi-th and speed segmentation for rooftop mapping, road surface detection, and RGBD cameras for wall detection.
Results consistently show excellent accuracy.
arXiv Detail & Related papers (2020-07-23T15:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.