From 2D to 3D: AISG-SLA Visual Localization Challenge
- URL: http://arxiv.org/abs/2407.18590v1
- Date: Fri, 26 Jul 2024 08:27:26 GMT
- Title: From 2D to 3D: AISG-SLA Visual Localization Challenge
- Authors: Jialin Gao, Bill Ong, Darld Lwi, Zhen Hao Ng, Xun Wei Yee, Mun-Thye Mak, Wee Siong Ng, See-Kiong Ng, Hui Ying Teo, Victor Khoo, Georg Bökman, Johan Edstedt, Kirill Brodt, Clémentin Boittiaux, Maxime Ferrera, Stepan Konev,
- Abstract summary: We organized the AISG-SLA Visual localization Challenge (VLC) at IJCAI 2023.
The challenge attracted over 300 participants worldwide, forming 50+ teams.
Winning teams achieved high accuracy in pose estimation using images from a car-mounted camera with low frame rates.
- Score: 16.39998393991086
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring 3D data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining the camera's pose solely through visual cues. However, this task is challenging due to limited data from a single camera. To tackle these challenges, we organized the AISG-SLA Visual Localization Challenge (VLC) at IJCAI 2023 to explore how AI can accurately extract camera pose data from 2D images in 3D space. The challenge attracted over 300 participants worldwide, forming 50+ teams. Winning teams achieved high accuracy in pose estimation using images from a car-mounted camera with low frame rates. The VLC dataset is available for research purposes upon request via vlc-dataset@aisingapore.org.
Related papers
- Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality [15.034352805342937]
The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing.
We provide a brand-new dataset, which maintains the same general characteristics of the L3DAS21 and L3DAS22 datasets.
We propose updated baseline models for both tasks that can now support audio-image couples as input and a supporting API to replicate our results.
arXiv Detail & Related papers (2024-02-14T15:34:28Z) - NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and
Pose Annotations [64.95582364215548]
NAVI is a new dataset of category-agnostic image collections with high-quality 3D scans and per-image 2D-3D alignments.
These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps.
arXiv Detail & Related papers (2023-06-15T13:11:30Z) - SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in
Urban Environments [0.0]
We present SLOPER4D, a novel scene-aware dataset collected in large urban environments.
We record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view.
SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters.
arXiv Detail & Related papers (2023-03-16T05:54:15Z) - EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos.
Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Towards Generalization of 3D Human Pose Estimation In The Wild [73.19542580408971]
3DBodyTex.Pose is a dataset that addresses the task of 3D human pose estimation in-the-wild.
3DBodyTex.Pose offers high quality and rich data containing 405 different real subjects in various clothing and poses, and 81k image samples with ground-truth 2D and 3D pose annotations.
arXiv Detail & Related papers (2020-04-21T13:31:58Z) - Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS [13.191601826570786]
We present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views.
It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate.
We propose a new large-scale multi-human dataset with 12 to 28 camera views.
arXiv Detail & Related papers (2020-03-09T08:54:00Z) - Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras [13.24490469380487]
We present an effective multi-view approach to end-to-end learning of precise manipulation tasks that are 3D in nature.
Our method learns to accomplish these tasks using multiple statically placed but uncalibrated RGB camera views without building an explicit 3D representation such as a pointcloud or voxel grid.
arXiv Detail & Related papers (2020-02-21T03:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.