A Systematic Evaluation of Generative Models on Tabular Transportation Data
- URL: http://arxiv.org/abs/2502.08856v1
- Date: Thu, 13 Feb 2025 00:14:55 GMT
- Title: A Systematic Evaluation of Generative Models on Tabular Transportation Data
- Authors: Chengen Wang, Alvaro Cardenas, Gurcan Comert, Murat Kantarcioglu,
- Abstract summary: We use New York City taxi data as a case study to evaluate the performance of widely used data generative models.
We introduce an improved privacy metric to address the limitations of the commonly-used one.
This work underscores the potential need to develop generative models specifically tailored to take advantage of the unique characteristics of emerging domains, such as transportation.
- Score: 14.566059618333155
- License:
- Abstract: The sharing of large-scale transportation data is beneficial for transportation planning and policymaking. However, it also raises significant security and privacy concerns, as the data may include identifiable personal information, such as individuals' home locations. To address these concerns, synthetic data generation based on real transportation data offers a promising solution that allows privacy protection while potentially preserving data utility. Although there are various synthetic data generation techniques, they are often not tailored to the unique characteristics of transportation data, such as the inherent structure of transportation networks formed by all trips in the datasets. In this paper, we use New York City taxi data as a case study to conduct a systematic evaluation of the performance of widely used tabular data generative models. In addition to traditional metrics such as distribution similarity, coverage, and privacy preservation, we propose a novel graph-based metric tailored specifically for transportation data. This metric evaluates the similarity between real and synthetic transportation networks, providing potentially deeper insights into their structural and functional alignment. We also introduced an improved privacy metric to address the limitations of the commonly-used one. Our experimental results reveal that existing tabular data generative models often fail to perform as consistently as claimed in the literature, particularly when applied to transportation data use cases. Furthermore, our novel graph metric reveals a significant gap between synthetic and real data. This work underscores the potential need to develop generative models specifically tailored to take advantage of the unique characteristics of emerging domains, such as transportation.
Related papers
- Contrastive Learning-Based privacy metrics in Tabular Synthetic Datasets [40.67424997797513]
Synthetic data has garnered attention as a Privacy Enhancing Technology (PET) in sectors such as healthcare and finance.
Similarity-based methods aim at finding the level of similarity between training and synthetic data.
Attack-based methods conduce deliberate attacks on synthetic datasets.
arXiv Detail & Related papers (2025-02-19T15:52:23Z) - Generative Models for Synthetic Urban Mobility Data: A Systematic Literature Review [44.99833362998488]
This systematic review provides a structured comparative overview of the current state of this heterogeneous, active field of research.
A special focus is put on the applicability of the reviewed models in practice.
arXiv Detail & Related papers (2024-07-12T11:54:29Z) - Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios [49.1574468325115]
We evaluate the utility of five state-of-the-art synthesis approaches in terms of real-world applicability.
We focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides.
One model fails to produce data within reasonable time and another generates too many jumps to meet the requirements for map matching.
arXiv Detail & Related papers (2024-07-03T16:08:05Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic Data: Methods, Use Cases, and Risks [11.413309528464632]
A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead.
We provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
arXiv Detail & Related papers (2023-03-01T16:35:33Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Generating synthetic mobility data for a realistic population with RNNs
to improve utility and privacy [3.3918638314432936]
We present a system for generating synthetic mobility data using a deep recurrent neural network (RNN)
The system takes a population distribution as input and generates mobility traces for a corresponding synthetic population.
We show the generated mobility data retain the characteristics of the real data, while varying from the real data at the individual level.
arXiv Detail & Related papers (2022-01-04T13:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.