Related papers: AI-based Re-identification of Behavioral Clickstream Data

AI-based Re-identification of Behavioral Clickstream Data

URL: http://arxiv.org/abs/2201.10351v1
Date: Fri, 21 Jan 2022 16:49:00 GMT
Title: AI-based Re-identification of Behavioral Clickstream Data
Authors: Stefan Vamosi and Michael Platzer and Thomas Reutterer
Abstract summary: This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns. The mere resemblance of behavioral patterns between records is sufficient to correctly attribute behavioral data to identified individuals. We also demonstrate how synthetic data can offer a viable alternative, that is shown to be resilient against our introduced AI-based re-identification attacks.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: AI-based face recognition, i.e., the re-identification of individuals within images, is an already well established technology for video surveillance, for user authentication, for tagging photos of friends, etc. This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns. In contrast to de-anonymization attacks based on record linkage, these methods do not require any overlap in data points between a released dataset and an identified auxiliary dataset. The mere resemblance of behavioral patterns between records is sufficient to correctly attribute behavioral data to identified individuals. Further, we can demonstrate that data perturbation does not provide protection, unless a significant share of data utility is being destroyed. These findings call for sincere cautions when sharing actual behavioral data with third parties, as modern-day privacy regulations, like the GDPR, define their scope based on the ability to re-identify. This has also strong implications for the Marketing domain, when dealing with potentially re-identify-able data sources like shopping behavior, clickstream data or cockies. We also demonstrate how synthetic data can offer a viable alternative, that is shown to be resilient against our introduced AI-based re-identification attacks.

Related papers

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property. We propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW) We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z)
SIG: A Synthetic Identity Generation Pipeline for Generating Evaluation Datasets for Face Recognition [0.0]
We introduce the Synthetic Identity Generation pipeline, or SIG, that allows for the targeted creation of ethical, balanced datasets for face recognition evaluation. Our pipeline generates high-quality images of synthetic identities with controllable pose, facial features, and demographic attributes, such as race, gender, and age. We also release an open-source evaluation dataset named ControlFace10k, consisting of 10,008 face images of 3,336 unique synthetic identities balanced across race, gender, and age.
arXiv Detail & Related papers (2024-09-12T18:18:02Z)
Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis [1.6693963355435217]
Adversarial attacks are a potential threat to machine learning models. These attacks cause incorrect predictions through imperceptible perturbations to the input data. This study proposes a set of key properties and corresponding metrics to assess the imperceptibility of adversarial attacks.
arXiv Detail & Related papers (2024-07-16T07:55:25Z)
Synthetic Data Outliers: Navigating Identity Disclosure [3.8811062755861956]
We analyze the privacy of synthetic data w.r.t the outliers. Our main findings suggest that outliers re-identification via linkage attack is feasible and easily achieved. Additional safeguards such as differential privacy can prevent re-identification, albeit at the expense of the data utility.
arXiv Detail & Related papers (2024-06-04T19:35:44Z)
Synthetic-To-Real Video Person Re-ID [57.937189569211505]
Person re-identification (Re-ID) is an important task and has significant applications for public security and information forensics. We investigate a novel and challenging setting of Re-ID, i.e., cross-domain video-based person Re-ID. We utilize synthetic video datasets as the source domain for training and real-world videos for testing.
arXiv Detail & Related papers (2024-02-03T10:19:21Z)
ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners. Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping. We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z)
RealGait: Gait Recognition for Person Re-Identification [79.67088297584762]
We construct a new gait dataset by extracting silhouettes from an existing video person re-identification challenge which consists of 1,404 persons walking in an unconstrained manner. Our results suggest that recognizing people by their gait in real surveillance scenarios is feasible and the underlying gait pattern is probably the true reason why video person re-idenfification works in practice.
arXiv Detail & Related papers (2022-01-13T06:30:56Z)
Modelling Adversarial Noise for Adversarial Defense [96.56200586800219]
adversarial defenses typically focus on exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model. Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction. We study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy.
arXiv Detail & Related papers (2021-09-21T01:13:26Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets. We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.