ViFi-Loc: Multi-modal Pedestrian Localization using GAN with
Camera-Phone Correspondences
- URL: http://arxiv.org/abs/2211.12021v1
- Date: Tue, 22 Nov 2022 05:27:38 GMT
- Title: ViFi-Loc: Multi-modal Pedestrian Localization using GAN with
Camera-Phone Correspondences
- Authors: Hansi Liu, Kristin Dana, Marco Gruteser, Hongsheng Lu
- Abstract summary: We propose a Generative Adversarial Network architecture to produce more accurate location estimations for pedestrians.
During training, it learns the underlying linkage between pedestrians' camera-phone data correspondences.
We show that our GAN produces 3D coordinates at 1 to 2 meter localization error across 5 different outdoor scenes.
- Score: 7.953401800573514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Smart City and Vehicle-to-Everything (V2X) systems, acquiring pedestrians'
accurate locations is crucial to traffic safety. Current systems adopt cameras
and wireless sensors to detect and estimate people's locations via sensor
fusion. Standard fusion algorithms, however, become inapplicable when
multi-modal data is not associated. For example, pedestrians are out of the
camera field of view, or data from camera modality is missing. To address this
challenge and produce more accurate location estimations for pedestrians, we
propose a Generative Adversarial Network (GAN) architecture. During training,
it learns the underlying linkage between pedestrians' camera-phone data
correspondences. During inference, it generates refined position estimations
based only on pedestrians' phone data that consists of GPS, IMU and FTM.
Results show that our GAN produces 3D coordinates at 1 to 2 meter localization
error across 5 different outdoor scenes. We further show that the proposed
model supports self-learning. The generated coordinates can be associated with
pedestrian's bounding box coordinates to obtain additional camera-phone data
correspondences. This allows automatic data collection during inference. After
fine-tuning on the expanded dataset, localization accuracy is improved by up to
26%.
Related papers
- Pedestrian Environment Model for Automated Driving [54.16257759472116]
We propose an environment model that includes the position of the pedestrians as well as their pose information.
We extract the skeletal information with a neural network human pose estimator from the image.
To obtain the 3D information of the position, we aggregate the data from consecutive frames in conjunction with the vehicle position.
arXiv Detail & Related papers (2023-08-17T16:10:58Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous
Driving [41.221988979184665]
SUPS is a simulated dataset for underground automatic parking.
It supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images.
We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset.
arXiv Detail & Related papers (2023-02-25T02:59:12Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving
with Long-Range Perception [0.0]
This dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view.
The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain.
We trained unimodal and multimodal baseline models for 3D object detection.
arXiv Detail & Related papers (2022-11-17T10:19:59Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Cross-Camera Trajectories Help Person Retrieval in a Camera Network [124.65912458467643]
Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network.
We propose a pedestrian retrieval framework based on cross-camera generation, which integrates both temporal and spatial information.
To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset.
arXiv Detail & Related papers (2022-04-27T13:10:48Z) - Automatic Map Update Using Dashcam Videos [1.6911482053867475]
We propose an SfM-based solution for automatic map update, with a focus on real-time change detection and localization.
Our system can locate the objects detected from 2D images in a 3D space, utilizing sparse SfM point clouds.
arXiv Detail & Related papers (2021-09-24T18:00:57Z) - Cross-Camera Feature Prediction for Intra-Camera Supervised Person
Re-identification across Distant Scenes [70.30052164401178]
Person re-identification (Re-ID) aims to match person images across non-overlapping camera views.
ICS-DS Re-ID uses cross-camera unpaired data with intra-camera identity labels for training.
Cross-camera feature prediction method to mine cross-camera self supervision information.
Joint learning of global-level and local-level features forms a global-local cross-camera feature prediction scheme.
arXiv Detail & Related papers (2021-07-29T11:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.