AI-based Re-identification of Behavioral Clickstream Data
- URL: http://arxiv.org/abs/2201.10351v1
- Date: Fri, 21 Jan 2022 16:49:00 GMT
- Title: AI-based Re-identification of Behavioral Clickstream Data
- Authors: Stefan Vamosi and Michael Platzer and Thomas Reutterer
- Abstract summary: This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns.
The mere resemblance of behavioral patterns between records is sufficient to correctly attribute behavioral data to identified individuals.
We also demonstrate how synthetic data can offer a viable alternative, that is shown to be resilient against our introduced AI-based re-identification attacks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: AI-based face recognition, i.e., the re-identification of individuals within
images, is an already well established technology for video surveillance, for
user authentication, for tagging photos of friends, etc. This paper
demonstrates that similar techniques can be applied to successfully re-identify
individuals purely based on their behavioral patterns. In contrast to
de-anonymization attacks based on record linkage, these methods do not require
any overlap in data points between a released dataset and an identified
auxiliary dataset. The mere resemblance of behavioral patterns between records
is sufficient to correctly attribute behavioral data to identified individuals.
Further, we can demonstrate that data perturbation does not provide protection,
unless a significant share of data utility is being destroyed. These findings
call for sincere cautions when sharing actual behavioral data with third
parties, as modern-day privacy regulations, like the GDPR, define their scope
based on the ability to re-identify. This has also strong implications for the
Marketing domain, when dealing with potentially re-identify-able data sources
like shopping behavior, clickstream data or cockies. We also demonstrate how
synthetic data can offer a viable alternative, that is shown to be resilient
against our introduced AI-based re-identification attacks.
Related papers
- SIG: A Synthetic Identity Generation Pipeline for Generating Evaluation Datasets for Face Recognition [0.0]
We introduce the Synthetic Identity Generation pipeline, or SIG, that allows for the targeted creation of ethical, balanced datasets for face recognition evaluation.
Our pipeline generates high-quality images of synthetic identities with controllable pose, facial features, and demographic attributes, such as race, gender, and age.
We also release an open-source evaluation dataset named ControlFace10k, consisting of 10,008 face images of 3,336 unique synthetic identities balanced across race, gender, and age.
arXiv Detail & Related papers (2024-09-12T18:18:02Z) - Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis [1.6693963355435217]
Adversarial attacks are a potential threat to machine learning models.
These attacks cause incorrect predictions through imperceptible perturbations to the input data.
This study proposes a set of key properties and corresponding metrics to assess the imperceptibility of adversarial attacks.
arXiv Detail & Related papers (2024-07-16T07:55:25Z) - Synthetic Data Outliers: Navigating Identity Disclosure [3.8811062755861956]
We analyze the privacy of synthetic data w.r.t the outliers.
Our main findings suggest that outliers re-identification via linkage attack is feasible and easily achieved.
Additional safeguards such as differential privacy can prevent re-identification, albeit at the expense of the data utility.
arXiv Detail & Related papers (2024-06-04T19:35:44Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - RealGait: Gait Recognition for Person Re-Identification [79.67088297584762]
We construct a new gait dataset by extracting silhouettes from an existing video person re-identification challenge which consists of 1,404 persons walking in an unconstrained manner.
Our results suggest that recognizing people by their gait in real surveillance scenarios is feasible and the underlying gait pattern is probably the true reason why video person re-idenfification works in practice.
arXiv Detail & Related papers (2022-01-13T06:30:56Z) - Modelling Adversarial Noise for Adversarial Defense [96.56200586800219]
adversarial defenses typically focus on exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model.
Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction.
We study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy.
arXiv Detail & Related papers (2021-09-21T01:13:26Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets.
We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.