Don't Look at the Data! How Differential Privacy Reconfigures the
Practices of Data Science
- URL: http://arxiv.org/abs/2302.11775v1
- Date: Thu, 23 Feb 2023 04:28:14 GMT
- Title: Don't Look at the Data! How Differential Privacy Reconfigures the
Practices of Data Science
- Authors: Jayshree Sarathy, Sophia Song, Audrey Haque, Tania Schlatter, Salil
Vadhan
- Abstract summary: Differential privacy (DP) is one promising way to offer privacy along with open access.
We conduct interviews with 19 data practitioners who are non-experts in DP.
We find that while DP is promising for providing wider access to sensitive datasets, it also introduces challenges into every stage of the data science workflow.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Across academia, government, and industry, data stewards are facing
increasing pressure to make datasets more openly accessible for researchers
while also protecting the privacy of data subjects. Differential privacy (DP)
is one promising way to offer privacy along with open access, but further
inquiry is needed into the tensions between DP and data science. In this study,
we conduct interviews with 19 data practitioners who are non-experts in DP as
they use a DP data analysis prototype to release privacy-preserving statistics
about sensitive data, in order to understand perceptions, challenges, and
opportunities around using DP. We find that while DP is promising for providing
wider access to sensitive datasets, it also introduces challenges into every
stage of the data science workflow. We identify ethics and governance questions
that arise when socializing data scientists around new privacy constraints and
offer suggestions to better integrate DP and data science.
Related papers
- DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA.
Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees.
Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z) - A Survey on Differential Privacy for SpatioTemporal Data in Transportation Research [0.9790236766474202]
In transportation, we are seeing a surge in intemporal data collection.
Recent developments in differential privacy in the context of such data have led to research in applied privacy.
To address the need for such data in research and inference without exposing private information, significant work has been proposed.
arXiv Detail & Related papers (2024-07-18T03:19:29Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets
for Public Use [0.4462475518267084]
CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records.
Data elements were included based on the usefulness, public request, and privacy implications.
Specific field values were suppressed to reduce risk of reidentification and exposure of confidential information.
arXiv Detail & Related papers (2021-01-13T14:24:20Z) - GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially
Private Generators [74.16405337436213]
We propose Gradient-sanitized Wasserstein Generative Adrial Networks (GS-WGAN)
GS-WGAN allows releasing a sanitized form of sensitive data with rigorous privacy guarantees.
We find our approach consistently outperforms state-of-the-art approaches across multiple metrics.
arXiv Detail & Related papers (2020-06-15T10:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.