AI-based Re-identification of Behavioral Clickstream Data
- URL: http://arxiv.org/abs/2201.10351v1
- Date: Fri, 21 Jan 2022 16:49:00 GMT
- Title: AI-based Re-identification of Behavioral Clickstream Data
- Authors: Stefan Vamosi and Michael Platzer and Thomas Reutterer
- Abstract summary: This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns.
The mere resemblance of behavioral patterns between records is sufficient to correctly attribute behavioral data to identified individuals.
We also demonstrate how synthetic data can offer a viable alternative, that is shown to be resilient against our introduced AI-based re-identification attacks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: AI-based face recognition, i.e., the re-identification of individuals within
images, is an already well established technology for video surveillance, for
user authentication, for tagging photos of friends, etc. This paper
demonstrates that similar techniques can be applied to successfully re-identify
individuals purely based on their behavioral patterns. In contrast to
de-anonymization attacks based on record linkage, these methods do not require
any overlap in data points between a released dataset and an identified
auxiliary dataset. The mere resemblance of behavioral patterns between records
is sufficient to correctly attribute behavioral data to identified individuals.
Further, we can demonstrate that data perturbation does not provide protection,
unless a significant share of data utility is being destroyed. These findings
call for sincere cautions when sharing actual behavioral data with third
parties, as modern-day privacy regulations, like the GDPR, define their scope
based on the ability to re-identify. This has also strong implications for the
Marketing domain, when dealing with potentially re-identify-able data sources
like shopping behavior, clickstream data or cockies. We also demonstrate how
synthetic data can offer a viable alternative, that is shown to be resilient
against our introduced AI-based re-identification attacks.
Related papers
- Footprints of Data in a Classifier Model: The Privacy Issues and Their Mitigation through Data Obfuscation [0.9208007322096533]
embedding of footprints of training data in a prediction model is one such facet.
difference in performance quality in test and training data causes passive identification of data that have trained the model.
This research focuses on addressing the vulnerability arising from the data footprints.
arXiv Detail & Related papers (2024-07-02T13:56:37Z) - Synthetic Data Outliers: Navigating Identity Disclosure [3.8811062755861956]
We analyze the privacy of synthetic data w.r.t the outliers.
Our main findings suggest that outliers re-identification via linkage attack is feasible and easily achieved.
Additional safeguards such as differential privacy can prevent re-identification, albeit at the expense of the data utility.
arXiv Detail & Related papers (2024-06-04T19:35:44Z) - Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification
via Full-Body Person Synthesis [16.394031759681678]
We motivate and introduce the Pedestrian dataset De-Identification task.
PDI evaluates the degree of de-identification and downstream task training performance for a given de-identification method.
We show how our data is able to narrow the synthetic-to-real performance gap in a privacy-conscious manner.
arXiv Detail & Related papers (2023-06-20T17:39:24Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - RealGait: Gait Recognition for Person Re-Identification [79.67088297584762]
We construct a new gait dataset by extracting silhouettes from an existing video person re-identification challenge which consists of 1,404 persons walking in an unconstrained manner.
Our results suggest that recognizing people by their gait in real surveillance scenarios is feasible and the underlying gait pattern is probably the true reason why video person re-idenfification works in practice.
arXiv Detail & Related papers (2022-01-13T06:30:56Z) - Modelling Adversarial Noise for Adversarial Defense [96.56200586800219]
adversarial defenses typically focus on exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model.
Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction.
We study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy.
arXiv Detail & Related papers (2021-09-21T01:13:26Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets.
We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.