Crowdsourced 3D Mapping: A Combined Multi-View Geometry and
Self-Supervised Learning Approach
- URL: http://arxiv.org/abs/2007.12918v1
- Date: Sat, 25 Jul 2020 12:10:16 GMT
- Title: Crowdsourced 3D Mapping: A Combined Multi-View Geometry and
Self-Supervised Learning Approach
- Authors: Hemang Chawla, Matti Jukola, Terence Brouns, Elahe Arani, and Bahram
Zonooz
- Abstract summary: We propose a framework that estimates the 3D positions of semantically meaningful landmarks without assuming known camera intrinsics.
We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning.
We achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively.
- Score: 10.610403488989428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to efficiently utilize crowdsourced visual data carries immense
potential for the domains of large scale dynamic mapping and autonomous
driving. However, state-of-the-art methods for crowdsourced 3D mapping assume
prior knowledge of camera intrinsics. In this work, we propose a framework that
estimates the 3D positions of semantically meaningful landmarks such as traffic
signs without assuming known camera intrinsics, using only monocular color
camera and GPS. We utilize multi-view geometry as well as deep learning based
self-calibration, depth, and ego-motion estimation for traffic sign
positioning, and show that combining their strengths is important for
increasing the map coverage. To facilitate research on this task, we construct
and make available a KITTI based 3D traffic sign ground truth positioning
dataset. Using our proposed framework, we achieve an average single-journey
relative and absolute positioning accuracy of 39cm and 1.26m respectively, on
this dataset.
Related papers
- MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Monocular Vision based Crowdsourced 3D Traffic Sign Positioning with
Unknown Camera Intrinsics and Distortion Coefficients [11.38332845467423]
We demonstrate an approach to computing 3D traffic sign positions without knowing the camera focal lengths, principal point, and distortion coefficients a priori.
We achieve an average single journey relative and absolute positioning accuracy of 0.26 m and 1.38 m, respectively.
arXiv Detail & Related papers (2020-07-09T07:03:17Z) - Predicting Semantic Map Representations from Images using Pyramid
Occupancy Networks [27.86228863466213]
We present a simple, unified approach for estimating maps directly from monocular images using a single end-to-end deep learning architecture.
We demonstrate the effectiveness of our approach by evaluating against several challenging baselines on the NuScenes and Argoverse datasets.
arXiv Detail & Related papers (2020-03-30T12:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.