PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection
of Open Data
- URL: http://arxiv.org/abs/2208.06481v1
- Date: Fri, 12 Aug 2022 19:57:09 GMT
- Title: PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection
of Open Data
- Authors: Kaustav Bhattacharjee, Akm Islam, Jaideep Vaidya, and Aritra Dasgupta
- Abstract summary: Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized.
We develop a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods.
We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism.
- Score: 3.2136309934080867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open data sets that contain personal information are susceptible to
adversarial attacks even when anonymized. By performing low-cost joins on
multiple datasets with shared attributes, malicious users of open data portals
might get access to information that violates individuals' privacy. However,
open data sets are primarily published using a release-and-forget model,
whereby data owners and custodians have little to no cognizance of these
privacy risks. We address this critical gap by developing a visual analytic
solution that enables data defenders to gain awareness about the disclosure
risks in local, joinable data neighborhoods. The solution is derived through a
design study with data privacy researchers, where we initially play the role of
a red team and engage in an ethical data hacking exercise based on privacy
attack scenarios. We use this problem and domain characterization to develop a
set of visual analytic interventions as a defense mechanism and realize them in
PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for
data defenders. PRIVEE uses a combination of risk scores and associated
interactive visualizations to let data defenders explore vulnerable joins and
interpret risks at multiple levels of data granularity. We demonstrate how
PRIVEE can help emulate the attack strategies and diagnose disclosure risks
through two case studies with data privacy experts.
Related papers
- Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - $\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy
Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure.
We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z) - Where you go is who you are -- A study on machine learning based
semantic privacy attacks [3.259843027596329]
We present a systematic analysis of two attack scenarios, namely location categorization and user profiling.
Experiments on the Foursquare dataset and tracking data demonstrate the potential for abuse of high-quality spatial information.
Our findings point out the risks of ever-growing databases of tracking data and spatial context data.
arXiv Detail & Related papers (2023-10-26T17:56:50Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Group privacy for personalized federated learning [4.30484058393522]
Federated learning is a type of collaborative machine learning, where participating clients process their data locally, sharing only updates to the collaborative model.
We propose a method to provide group privacy guarantees exploiting some key properties of $d$-privacy.
arXiv Detail & Related papers (2022-06-07T15:43:45Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Lessons from the AdKDD'21 Privacy-Preserving ML Challenge [57.365745458033075]
A prominent proposal at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays.
To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD'21.
A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap.
arXiv Detail & Related papers (2022-01-31T11:09:59Z) - Mitigating Leakage from Data Dependent Communications in Decentralized
Computing using Differential Privacy [1.911678487931003]
We propose a general execution model to control the data-dependence of communications in user-side decentralized computations.
Our formal privacy guarantees leverage and extend recent results on privacy amplification by shuffling.
arXiv Detail & Related papers (2021-12-23T08:30:17Z) - PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework
Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL)
PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance.
Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z) - TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework
for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation.
The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.