Related papers: PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection of Open Data

PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection of Open Data

URL: http://arxiv.org/abs/2208.06481v1
Date: Fri, 12 Aug 2022 19:57:09 GMT
Title: PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection of Open Data
Authors: Kaustav Bhattacharjee, Akm Islam, Jaideep Vaidya, and Aritra Dasgupta
Abstract summary: Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. We develop a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism.
Score: 3.2136309934080867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals' privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.

Related papers

MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z)
A Survey on Privacy Risks and Protection in Large Language Models [13.602836059584682]
Large Language Models (LLMs) have become increasingly integral to diverse applications, raising privacy concerns.<n>This survey offers a comprehensive overview of privacy risks associated with LLMs and examines current solutions to mitigate these challenges.
arXiv Detail & Related papers (2025-05-04T03:04:07Z)
Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks [49.1574468325115]
We propose a novel attack to reconstruct user identifiers in GPS trip datasets consisting of single trips. We show that the risk of re-identification is significant even when personal identifiers have been removed. Further investigations indicate that users who frequently visit locations that are only visited by a small number of others tend to be more vulnerable to re-identification.
arXiv Detail & Related papers (2025-02-12T08:54:49Z)
FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments. We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z)
Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data [17.11821761700748]
This study advances the understanding and protection against privacy risks emanating from network structure. We develop a novel graph private attribute inference attack, which acts as a pivotal tool for evaluating the potential for privacy leakage through network structures. Our attack model poses a significant threat to user privacy, and our graph data publishing method successfully achieves the optimal privacy-utility trade-off.
arXiv Detail & Related papers (2024-07-26T07:40:54Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
$\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure. We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z)
Where you go is who you are -- A study on machine learning based semantic privacy attacks [3.259843027596329]
We present a systematic analysis of two attack scenarios, namely location categorization and user profiling. Experiments on the Foursquare dataset and tracking data demonstrate the potential for abuse of high-quality spatial information. Our findings point out the risks of ever-growing databases of tracking data and spatial context data.
arXiv Detail & Related papers (2023-10-26T17:56:50Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Releasing survey microdata with exact cluster locations and additional privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards. Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z)
Lessons from the AdKDD'21 Privacy-Preserving ML Challenge [57.365745458033075]
A prominent proposal at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays. To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD'21. A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap.
arXiv Detail & Related papers (2022-01-31T11:09:59Z)
Mitigating Leakage from Data Dependent Communications in Decentralized Computing using Differential Privacy [1.911678487931003]
We propose a general execution model to control the data-dependence of communications in user-side decentralized computations. Our formal privacy guarantees leverage and extend recent results on privacy amplification by shuffling.
arXiv Detail & Related papers (2021-12-23T08:30:17Z)
PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL) PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance. Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z)
TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.