Driving Privacy Forward: Mitigating Information Leakage within Smart Vehicles through Synthetic Data Generation
- URL: http://arxiv.org/abs/2410.08462v1
- Date: Fri, 11 Oct 2024 02:28:27 GMT
- Title: Driving Privacy Forward: Mitigating Information Leakage within Smart Vehicles through Synthetic Data Generation
- Authors: Krish Parikh,
- Abstract summary: We propose a taxonomy of 14 in-vehicle sensors, identifying potential attacks and categorising their vulnerability.
We then focus on the most vulnerable signals, using the Passive Vehicular Sensor (PVS) dataset to generate synthetic data.
Our results show that we achieved 90.1% statistical similarity and 78% classification accuracy when tested on its original intent.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smart vehicles produce large amounts of data, much of which is sensitive and at risk of privacy breaches. As attackers increasingly exploit anonymised metadata within these datasets to profile drivers, it's important to find solutions that mitigate this information leakage without hindering innovation and ongoing research. Synthetic data has emerged as a promising tool to address these privacy concerns, as it allows for the replication of real-world data relationships while minimising the risk of revealing sensitive information. In this paper, we examine the use of synthetic data to tackle these challenges. We start by proposing a comprehensive taxonomy of 14 in-vehicle sensors, identifying potential attacks and categorising their vulnerability. We then focus on the most vulnerable signals, using the Passive Vehicular Sensor (PVS) dataset to generate synthetic data with a Tabular Variational Autoencoder (TVAE) model, which included over 1 million data points. Finally, we evaluate this against 3 core metrics: fidelity, utility, and privacy. Our results show that we achieved 90.1% statistical similarity and 78% classification accuracy when tested on its original intent while also preventing the profiling of the driver. The code can be found at https://github.com/krish-parikh/Synthetic-Data-Generation
Related papers
- Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data [14.779917834583577]
We show that standard privacy attack methods are inadequate for assessing privacy risks of smart meter datasets.
We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers.
arXiv Detail & Related papers (2024-07-16T14:41:27Z) - Footprints of Data in a Classifier Model: The Privacy Issues and Their Mitigation through Data Obfuscation [0.9208007322096533]
embedding of footprints of training data in a prediction model is one such facet.
difference in performance quality in test and training data causes passive identification of data that have trained the model.
This research focuses on addressing the vulnerability arising from the data footprints.
arXiv Detail & Related papers (2024-07-02T13:56:37Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing [74.58071278710896]
generative AI has attracted much attention from both academic and industrial fields.
Secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement.
arXiv Detail & Related papers (2024-05-17T04:00:58Z) - Autosen: improving automatic wifi human sensing through cross-modal
autoencoder [56.44764266426344]
WiFi human sensing is highly regarded for its low-cost and privacy advantages in recognizing human activities.
Traditional cross-modal methods, aimed at enabling self-supervised learning without labeled data, struggle to extract meaningful features from amplitude-phase combinations.
We introduce AutoSen, an innovative automatic WiFi sensing solution that departs from conventional approaches.
arXiv Detail & Related papers (2024-01-08T19:50:02Z) - Adversarial Machine Learning-Enabled Anonymization of OpenWiFi Data [9.492736565723892]
Data privacy and protection through anonymization is a critical issue for network operators or data owners before it is forwarded for other possible use of data.
OpenWiFi networks are vulnerable to any adversary who is trying to gain access or knowledge on traffic regardless of the knowledge possessed by data owners.
CTGAN yields synthetic data; which disguises as actual data but fostering hidden acute information of actual data.
arXiv Detail & Related papers (2024-01-03T04:59:03Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Privacy-Utility Trades in Crowdsourced Signal Map Obfuscation [20.58763760239068]
Crowdsource celluar signal strength measurements can be used to generate signal maps to improve network performance.
We consider obfuscating such data before the data leaves the mobile device.
Our evaluation results, based on multiple, diverse, real-world signal map datasets, demonstrate the feasibility of concurrently achieving adequate privacy and utility.
arXiv Detail & Related papers (2022-01-13T03:46:22Z) - Anonymizing Sensor Data on the Edge: A Representation Learning and
Transformation Approach [4.920145245773581]
In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation.
We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data.
We show that it can anonymize data in real time on resource-constrained edge devices.
arXiv Detail & Related papers (2020-11-16T22:32:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.