Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting
- URL: http://arxiv.org/abs/2303.03053v1
- Date: Mon, 6 Mar 2023 11:54:58 GMT
- Title: Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting
- Authors: Nina Mouhammad, Johannes Daxenberger, Benjamin Schiller, Ivan Habernal
- Abstract summary: Data labeling is often done on crowdsourcing platforms due to scalability reasons.
publishing data on public platforms can only be done if no privacy-relevant information is included.
We investigate how removing personally identifiable information (PII) as well as applying differential privacy (DP) rewriting can enable text with privacy-relevant information to be used for crowdsourcing.
- Score: 9.409281517596396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most tasks in NLP require labeled data. Data labeling is often done on
crowdsourcing platforms due to scalability reasons. However, publishing data on
public platforms can only be done if no privacy-relevant information is
included. Textual data often contains sensitive information like person names
or locations. In this work, we investigate how removing personally identifiable
information (PII) as well as applying differential privacy (DP) rewriting can
enable text with privacy-relevant information to be used for crowdsourcing. We
find that DP-rewriting before crowdsourcing can preserve privacy while still
leading to good label quality for certain tasks and data. PII-removal led to
good label quality in all examined tasks, however, there are no privacy
guarantees given.
Related papers
- PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Personalized Differential Privacy for Ridge Regression [3.4751583941317166]
We introduce our novel Personalized-DP Output Perturbation method ( PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels.
We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model.
We show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al.
arXiv Detail & Related papers (2024-01-30T16:00:14Z) - Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies [13.771909487087793]
Apple introduced privacy labels in Dec. 2020 as a way for developers to report the privacy behaviors of their apps.
While Apple does not validate labels, they also require developers to provide a privacy policy, which offers an important comparison point.
We fine-tuned BERT-based language models to extract privacy policy features for 474,669 apps on the iOS App Store.
arXiv Detail & Related papers (2023-06-29T16:10:18Z) - The Overview of Privacy Labels and their Compatibility with Privacy
Policies [24.871967983289117]
Privacy nutrition labels provide a way to understand an app's key data practices without reading the long and hard-to-read privacy policies.
Apple and Google have implemented mandates requiring app developers to fill privacy nutrition labels highlighting their privacy practices.
arXiv Detail & Related papers (2023-03-14T20:10:28Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Privacy-Aware Crowd Labelling for Machine Learning Tasks [3.6930948691311007]
We propose a privacy preserving text labelling method for varying applications, based in crowdsourcing.
We transform text with different levels of privacy, and analyse the effectiveness of the transformation with regards to label correlation and consistency.
arXiv Detail & Related papers (2022-02-03T18:14:45Z) - Privacy Amplification via Shuffling for Linear Contextual Bandits [51.94904361874446]
We study the contextual linear bandit problem with differential privacy (DP)
We show that it is possible to achieve a privacy/utility trade-off between JDP and LDP by leveraging the shuffle model of privacy.
Our result shows that it is possible to obtain a tradeoff between JDP and LDP by leveraging the shuffle model while preserving local privacy.
arXiv Detail & Related papers (2021-12-11T15:23:28Z) - BeeTrace: A Unified Platform for Secure Contact Tracing that Breaks Data
Silos [73.84437456144994]
Contact tracing is an important method to control the spread of an infectious disease such as COVID-19.
Current solutions do not utilize the huge volume of data stored in business databases and individual digital devices.
We propose BeeTrace, a unified platform that breaks data silos and deploys state-of-the-art cryptographic protocols to guarantee privacy goals.
arXiv Detail & Related papers (2020-07-05T10:33:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.