Behind the Mask: Demographic bias in name detection for PII masking
- URL: http://arxiv.org/abs/2205.04505v1
- Date: Mon, 9 May 2022 18:21:41 GMT
- Title: Behind the Mask: Demographic bias in name detection for PII masking
- Authors: Courtney Mansfield, Amandalynne Paullada, Kristen Howell
- Abstract summary: We evaluate the performance of three off-the-shelf PII masking systems on name detection and redaction.
We find that an open-source RoBERTa-based system shows fewer disparities than the commercial models we test.
The highest error rates occurred for names associated with Black and Asian/Pacific Islander individuals.
- Score: 5.071136834627255
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many datasets contain personally identifiable information, or PII, which
poses privacy risks to individuals. PII masking is commonly used to redact
personal information such as names, addresses, and phone numbers from text
data. Most modern PII masking pipelines involve machine learning algorithms.
However, these systems may vary in performance, such that individuals from
particular demographic groups bear a higher risk for having their personal
information exposed. In this paper, we evaluate the performance of three
off-the-shelf PII masking systems on name detection and redaction. We generate
data using names and templates from the customer service domain. We find that
an open-source RoBERTa-based system shows fewer disparities than the commercial
models we test. However, all systems demonstrate significant differences in
error rate based on demographics. In particular, the highest error rates
occurred for names associated with Black and Asian/Pacific Islander
individuals.
Related papers
- Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks.
We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately.
Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z) - When Graph Convolution Meets Double Attention: Online Privacy Disclosure Detection with Multi-Label Text Classification [6.700420953065072]
It is important to detect such unwanted privacy disclosures to help alert people affected and the online platform.
In this paper, privacy disclosure detection is modeled as a multi-label text classification problem.
A new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures.
arXiv Detail & Related papers (2023-11-27T15:25:17Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z) - Assessing Demographic Bias Transfer from Dataset to Model: A Case Study
in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model.
We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z) - Unique on Facebook: Formulation and Evidence of (Nano)targeting
Individual Users with non-PII Data [0.10799106628248668]
We define a data-driven model to quantify the number of interests from a user that make them unique on Facebook.
To the best of our knowledge, this represents the first study of individuals' uniqueness at the world population scale.
We run an experiment through 21 Facebook ad campaigns that target three of the authors of this paper.
arXiv Detail & Related papers (2021-10-13T11:00:22Z) - Robustness Disparities in Commercial Face Detection [72.25318723264215]
We present the first of its kind detailed benchmark of the robustness of three such systems: Amazon Rekognition, Microsoft Azure, and Google Cloud Platform.
We generally find that photos of individuals who are older, masculine presenting, of darker skin type, or have dim lighting are more susceptible to errors than their counterparts in other identities.
arXiv Detail & Related papers (2021-08-27T21:37:16Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Camera-aware Proxies for Unsupervised Person Re-Identification [60.26031011794513]
This paper tackles the purely unsupervised person re-identification (Re-ID) problem that requires no annotations.
We propose to split each single cluster into multiple proxies and each proxy represents the instances coming from the same camera.
Based on the camera-aware proxies, we design both intra- and inter-camera contrastive learning components for our Re-ID model.
arXiv Detail & Related papers (2020-12-19T12:37:04Z) - How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets.
We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z) - Assessing Demographic Bias in Named Entity Recognition [0.21485350418225244]
We assess the bias in Named Entity Recognition systems for English across different demographic groups with synthetically generated corpora.
Character-based contextualized word representation models such as ELMo results in the least bias across demographics.
arXiv Detail & Related papers (2020-08-08T02:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.