Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios
- URL: http://arxiv.org/abs/2407.03237v1
- Date: Wed, 3 Jul 2024 16:08:05 GMT
- Title: Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios
- Authors: Alexandra Kapp, Helena Mihaljević,
- Abstract summary: We evaluate the utility of five state-of-the-art synthesis approaches in terms of real-world applicability.
We focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides.
One model fails to produce data within reasonable time and another generates too many jumps to meet the requirements for map matching.
- Score: 49.1574468325115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, there has been a surge in the development of models for the generation of synthetic mobility data. These models aim to facilitate the sharing of data while safeguarding privacy, all while ensuring high utility and flexibility regarding potential applications. However, current utility evaluation methods fail to fully account for real-life requirements. We evaluate the utility of five state-of-the-art synthesis approaches, each with and without the incorporation of differential privacy (DP) guarantees, in terms of real-world applicability. Specifically, we focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides. Such data prove particularly valuable for downstream tasks at the road network level. Thus, our initial step involves appropriately map matching the synthetic data and subsequently comparing the resulting trips with those generated by the routing algorithm implemented in OpenStreetMap, which serves as an efficient and privacy-friendly baseline. Out of the five evaluated models, one fails to produce data within reasonable computation time and another generates too many jumps to meet the requirements for map matching. The remaining three models succeed to a certain degree in maintaining spatial distribution, one even with DP guarantees. However, all models struggle to produce meaningful sequences of geo-locations with reasonable trip lengths and to model traffic flow at intersections accurately. It is important to note that trip data encompasses various relevant characteristics beyond spatial distribution, such as temporal information, all of which are discarded by these models. Consequently, our results imply that current synthesis models fall short in their promise of high utility and flexibility.
Related papers
- OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Deep Temporal Deaggregation: Large-Scale Spatio-Temporal Generative Models [5.816964541847194]
We propose a transformer-based diffusion model, TDDPM, for time-series which outperforms and scales substantially better than state-of-the-art.
This is evaluated in a new comprehensive benchmark across several sequence lengths, standard datasets, and evaluation measures.
arXiv Detail & Related papers (2024-06-18T09:16:11Z) - ST-DPGAN: A Privacy-preserving Framework for Spatiotemporal Data Generation [19.18074489351738]
We propose a Graph-based model for generating privacy-protected data.
Experiments conducted on three real-worldtemporal datasets validate the efficacy of our model.
The prediction model trained on our generated data maintains a competitive edge compared to the model trained on the original data.
arXiv Detail & Related papers (2024-06-04T04:43:54Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D
LiDAR Segmentation [60.07812405063708]
3D point cloud semantic segmentation is fundamental for autonomous driving.
Most approaches in the literature neglect an important aspect, i.e., how to deal with domain shift when handling dynamic scenes.
This paper advances the state of the art in this research field.
arXiv Detail & Related papers (2022-07-20T09:06:07Z) - Virtual passengers for real car solutions: synthetic datasets [2.1028463367241033]
We build a 3D scenario and set-up to resemble reality as closely as possible.
It is possible to configure and vary parameters to add randomness to the scene.
We present the process and concept of synthetic data generation in an automotive context.
arXiv Detail & Related papers (2022-05-13T10:54:39Z) - Mobility Inference on Long-Tailed Sparse Trajectory [2.4444287331956898]
We propose a single trajectory inference algorithm that utilizes a generic long-tailed sparsity pattern in the large-scale trajectory data.
The algorithm guarantees a 100% precision in the stay/travel inference with a provable lower-bound in the recall.
Evaluations with three trajectory data sets of 40 million urban users validate the performance guarantees of the proposed inference algorithm.
arXiv Detail & Related papers (2020-01-21T16:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.