Generating synthetic mobility data for a realistic population with RNNs
to improve utility and privacy
- URL: http://arxiv.org/abs/2201.01139v1
- Date: Tue, 4 Jan 2022 13:58:22 GMT
- Title: Generating synthetic mobility data for a realistic population with RNNs
to improve utility and privacy
- Authors: Alex Berke, Ronan Doorley, Kent Larson, Esteban Moro
- Abstract summary: We present a system for generating synthetic mobility data using a deep recurrent neural network (RNN)
The system takes a population distribution as input and generates mobility traces for a corresponding synthetic population.
We show the generated mobility data retain the characteristics of the real data, while varying from the real data at the individual level.
- Score: 3.3918638314432936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Location data collected from mobile devices represent mobility behaviors at
individual and societal levels. These data have important applications ranging
from transportation planning to epidemic modeling. However, issues must be
overcome to best serve these use cases: The data often represent a limited
sample of the population and use of the data jeopardizes privacy.
To address these issues, we present and evaluate a system for generating
synthetic mobility data using a deep recurrent neural network (RNN) which is
trained on real location data. The system takes a population distribution as
input and generates mobility traces for a corresponding synthetic population.
Related generative approaches have not solved the challenges of capturing
both the patterns and variability in individuals' mobility behaviors over
longer time periods, while also balancing the generation of realistic data with
privacy. Our system leverages RNNs' ability to generate complex and novel
sequences while retaining patterns from training data. Also, the model
introduces randomness used to calibrate the variation between the synthetic and
real data at the individual level. This is to both capture variability in human
mobility, and protect user privacy.
Location based services (LBS) data from more than 22,700 mobile devices were
used in an experimental evaluation across utility and privacy metrics. We show
the generated mobility data retain the characteristics of the real data, while
varying from the real data at the individual level, and where this amount of
variation matches the variation within the real data.
Related papers
- Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios [49.1574468325115]
We evaluate the utility of five state-of-the-art synthesis approaches in terms of real-world applicability.
We focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides.
One model fails to produce data within reasonable time and another generates too many jumps to meet the requirements for map matching.
arXiv Detail & Related papers (2024-07-03T16:08:05Z) - Deep Temporal Deaggregation: Large-Scale Spatio-Temporal Generative Models [5.816964541847194]
We propose a transformer-based diffusion model, TDDPM, for time-series which outperforms and scales substantially better than state-of-the-art.
This is evaluated in a new comprehensive benchmark across several sequence lengths, standard datasets, and evaluation measures.
arXiv Detail & Related papers (2024-06-18T09:16:11Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Representation Learning for Wearable-Based Applications in the Case of
Missing Data [20.37256375888501]
multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations.
We investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches.
Our study provides insights for the design and development of masking-based self-supervised learning tasks.
arXiv Detail & Related papers (2024-01-08T08:21:37Z) - On Inferring User Socioeconomic Status with Mobility Records [61.0966646857356]
We propose a socioeconomic-aware deep model called DeepSEI.
The DeepSEI model incorporates two networks called deep network and recurrent network.
We conduct extensive experiments on real mobility records data, POI data and house prices data.
arXiv Detail & Related papers (2022-11-15T15:07:45Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Improving Correlation Capture in Generating Imbalanced Data using
Differentially Private Conditional GANs [2.2265840715792735]
We propose DP-CGANS, a differentially private conditional GAN framework consisting of data transformation, sampling, conditioning, and networks training to generate realistic and privacy-preserving data.
We extensively evaluate our model with state-of-the-art generative models on three public datasets and two real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement.
arXiv Detail & Related papers (2022-06-28T06:47:27Z) - Transformer Networks for Data Augmentation of Human Physical Activity
Recognition [61.303828551910634]
State of the art models like Recurrent Generative Adrial Networks (RGAN) are used to generate realistic synthetic data.
In this paper, transformer based generative adversarial networks which have global attention on data, are compared on PAMAP2 and Real World Human Activity Recognition data sets with RGAN.
arXiv Detail & Related papers (2021-09-02T16:47:29Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - AttnMove: History Enhanced Trajectory Recovery via Attentional Network [15.685998183691655]
We propose a novel attentional neural network-based model, named AttnMove, to densify individual trajectories by recovering unobserved locations.
We evaluate our model on two real-world datasets, and extensive results demonstrate the performance gain compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-01-03T15:45:35Z) - Differentially Private Synthetic Medical Data Generation using
Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy.
Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data.
We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.