Breaking Anonymity at Scale: Re-identifying the Trajectories of 100K Real Users in Japan
- URL: http://arxiv.org/abs/2506.05611v1
- Date: Thu, 05 Jun 2025 21:51:50 GMT
- Title: Breaking Anonymity at Scale: Re-identifying the Trajectories of 100K Real Users in Japan
- Authors: Abhishek Kumar Mishra, Mathieu Cunche, Heber H. Arcolezi,
- Abstract summary: We analyze the anonymized Yjmob100k dataset, which captures the trajectories of 100,000 users in Japan.<n>We leverage population density patterns, structural correlations, and temporal activity profiles to re-identify the dataset's real-world location and timing.<n>This work underscores the limitations of current trajectory anonymization methods and calls for more robust privacy mechanisms in the publication of mobility data.
- Score: 0.8192907805418581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mobility traces represent a critical class of personal data, often subjected to privacy-preserving transformations before public release. In this study, we analyze the anonymized Yjmob100k dataset, which captures the trajectories of 100,000 users in Japan, and demonstrate how existing anonymization techniques fail to protect their sensitive attributes. We leverage population density patterns, structural correlations, and temporal activity profiles to re-identify the dataset's real-world location and timing. Our results reveal that the anonymization process carried out for Yjmob100k is inefficient and preserves enough spatial and temporal structure to enable re-identification. This work underscores the limitations of current trajectory anonymization methods and calls for more robust privacy mechanisms in the publication of mobility data.
Related papers
- Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z) - Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks [49.1574468325115]
We propose a novel attack to reconstruct user identifiers in GPS trip datasets consisting of single trips.<n>We show that the risk of re-identification is significant even when personal identifiers have been removed.<n>Further investigations indicate that users who frequently visit locations that are only visited by a small number of others tend to be more vulnerable to re-identification.
arXiv Detail & Related papers (2025-02-12T08:54:49Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Disentangle Before Anonymize: A Two-stage Framework for Attribute-preserved and Occlusion-robust De-identification [55.741525129613535]
"Disentangle Before Anonymize" is a novel two-stage Framework(DBAF)<n>This framework includes a Contrastive Identity Disentanglement (CID) module and a Key-authorized Reversible Identity Anonymization (KRIA) module.<n>Extensive experiments demonstrate that our method outperforms state-of-the-art de-identification approaches.
arXiv Detail & Related papers (2023-11-15T08:59:02Z) - Privacy-Preserving Hierarchical Anonymization Framework over Encrypted Data [0.061446808540639365]
This study proposes a hierarchical k-anonymization framework using homomorphic encryption and secret sharing composed of two types of domains.
The experimental results show that connecting two domains can accelerate the anonymization process, indicating that the proposed secure hierarchical architecture is practical and efficient.
arXiv Detail & Related papers (2023-10-19T01:08:37Z) - A Trajectory K-Anonymity Model Based on Point Density and Partition [0.0]
This paper develops a trajectory K-anonymity model based on Point Density and Partition (K PDP)
It successfully resists re-identification attacks and reduces the data utility loss of the k-anonymized dataset.
arXiv Detail & Related papers (2023-07-31T17:10:56Z) - A False Sense of Privacy: Towards a Reliable Evaluation Methodology for the Anonymization of Biometric Data [8.799600976940678]
Biometric data contains distinctive human traits such as facial features or gait patterns.
Privacy protection is extensively afforded by the technique of anonymization.
We assess the state-of-the-art methods used to evaluate the performance of anonymization.
arXiv Detail & Related papers (2023-04-04T08:46:14Z) - Attribute-preserving Face Dataset Anonymization via Latent Code
Optimization [64.4569739006591]
We present a task-agnostic anonymization procedure that directly optimize the images' latent representation in the latent space of a pre-trained GAN.
We demonstrate through a series of experiments that our method is capable of anonymizing the identity of the images whilst -- crucially -- better-preserving the facial attributes.
arXiv Detail & Related papers (2023-03-20T17:34:05Z) - Learnable Privacy-Preserving Anonymization for Pedestrian Images [27.178354411900127]
This paper studies a novel privacy-preserving anonymization problem for pedestrian images.
It preserves personal identity information (PII) for authorized models and prevents PII from being recognized by third parties.
We propose a joint learning reversible anonymization framework, which can reversibly generate full-body anonymous images.
arXiv Detail & Related papers (2022-07-24T07:04:16Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets.
We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.