Related papers: Adversarial Machine Learning-Enabled Anonymization of OpenWiFi Data

Adversarial Machine Learning-Enabled Anonymization of OpenWiFi Data

URL: http://arxiv.org/abs/2401.01542v1
Date: Wed, 3 Jan 2024 04:59:03 GMT
Title: Adversarial Machine Learning-Enabled Anonymization of OpenWiFi Data
Authors: Samhita Kuili, Kareem Dabbour, Irtiza Hasan, Andrea Herscovich, Burak Kantarci, Marcel Chenier, Melike Erol-Kantarci
Abstract summary: Data privacy and protection through anonymization is a critical issue for network operators or data owners before it is forwarded for other possible use of data. OpenWiFi networks are vulnerable to any adversary who is trying to gain access or knowledge on traffic regardless of the knowledge possessed by data owners. CTGAN yields synthetic data; which disguises as actual data but fostering hidden acute information of actual data.
Score: 9.492736565723892
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Data privacy and protection through anonymization is a critical issue for network operators or data owners before it is forwarded for other possible use of data. With the adoption of Artificial Intelligence (AI), data anonymization augments the likelihood of covering up necessary sensitive information; preventing data leakage and information loss. OpenWiFi networks are vulnerable to any adversary who is trying to gain access or knowledge on traffic regardless of the knowledge possessed by data owners. The odds for discovery of actual traffic information is addressed by applied conditional tabular generative adversarial network (CTGAN). CTGAN yields synthetic data; which disguises as actual data but fostering hidden acute information of actual data. In this paper, the similarity assessment of synthetic with actual data is showcased in terms of clustering algorithms followed by a comparison of performance for unsupervised cluster validation metrics. A well-known algorithm, K-means outperforms other algorithms in terms of similarity assessment of synthetic data over real data while achieving nearest scores 0.634, 23714.57, and 0.598 as Silhouette, Calinski and Harabasz and Davies Bouldin metric respectively. On exploiting a comparative analysis in validation scores among several algorithms, K-means forms the epitome of unsupervised clustering algorithms ensuring explicit usage of synthetic data at the same time a replacement for real data. Hence, the experimental results aim to show the viability of using CTGAN-generated synthetic data in lieu of publishing anonymized data to be utilized in various applications.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Contrastive Learning-Based privacy metrics in Tabular Synthetic Datasets [40.67424997797513]
Synthetic data has garnered attention as a Privacy Enhancing Technology (PET) in sectors such as healthcare and finance. Similarity-based methods aim at finding the level of similarity between training and synthetic data. Attack-based methods conduce deliberate attacks on synthetic datasets.
arXiv Detail & Related papers (2025-02-19T15:52:23Z)
Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining. Experiments on the MNIST dataset for handwritten digit recognition were performed. It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z)
Differentially Private Synthetic Data Using KD-Trees [11.96971298978997]
We exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for $epsilon$-differentially private synthetic data generation. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
arXiv Detail & Related papers (2023-06-19T17:08:32Z)
Membership Inference Attacks against Synthetic Data through Overfitting Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution. We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z)
Personalized Decentralized Multi-Task Learning Over Dynamic Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated. Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other. We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z)
Secure Multiparty Computation for Synthetic Data Generation from Distributed Data [7.370727048591523]
Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Existing approaches assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation. We propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation.
arXiv Detail & Related papers (2022-10-13T20:09:17Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
AI-based Re-identification of Behavioral Clickstream Data [0.0]
This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns. The mere resemblance of behavioral patterns between records is sufficient to correctly attribute behavioral data to identified individuals. We also demonstrate how synthetic data can offer a viable alternative, that is shown to be resilient against our introduced AI-based re-identification attacks.
arXiv Detail & Related papers (2022-01-21T16:49:00Z)
Using Synthetic Data to Enhance the Accuracy of Fingerprint-Based Localization: A Deep Learning Approach [1.6379393441314491]
We introduce a novel approach to reduce training data collection costs in fingerprint-based localization by using synthetic data. Generative adversarial networks (GANs) are used to learn the distribution of a limited sample of collected data. We can obtain essentially similar positioning accuracy to that which would be obtained by using the full set of collected data.
arXiv Detail & Related papers (2021-05-05T07:36:01Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications. We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN) We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.