"What We Can't Measure, We Can't Understand": Challenges to Demographic
Data Procurement in the Pursuit of Fairness
- URL: http://arxiv.org/abs/2011.02282v2
- Date: Sat, 23 Jan 2021 01:18:19 GMT
- Title: "What We Can't Measure, We Can't Understand": Challenges to Demographic
Data Procurement in the Pursuit of Fairness
- Authors: McKane Andrus, Elena Spitzer, Jeffrey Brown, Alice Xiang
- Abstract summary: algorithmic fairness practitioners often do not have access to demographic data they feel they need to detect bias in practice.
We investigated this dilemma through semi-structured interviews with 38 practitioners and professionals either working in or adjacent to algorithmic fairness.
Participants painted a complex picture of what demographic data availability and use look like on the ground.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As calls for fair and unbiased algorithmic systems increase, so too does the
number of individuals working on algorithmic fairness in industry. However,
these practitioners often do not have access to the demographic data they feel
they need to detect bias in practice. Even with the growing variety of toolkits
and strategies for working towards algorithmic fairness, they almost invariably
require access to demographic attributes or proxies. We investigated this
dilemma through semi-structured interviews with 38 practitioners and
professionals either working in or adjacent to algorithmic fairness.
Participants painted a complex picture of what demographic data availability
and use look like on the ground, ranging from not having access to personal
data of any kind to being legally required to collect and use demographic data
for discrimination assessments. In many domains, demographic data collection
raises a host of difficult questions, including how to balance privacy and
fairness, how to define relevant social categories, how to ensure meaningful
consent, and whether it is appropriate for private companies to infer someone's
demographics. Our research suggests challenges that must be considered by
businesses, regulators, researchers, and community groups in order to enable
practitioners to address algorithmic bias in practice. Critically, we do not
propose that the overall goal of future work should be to simply lower the
barriers to collecting demographic data. Rather, our study surfaces a swath of
normative questions about how, when, and whether this data should be procured,
and, in cases where it is not, what should still be done to mitigate bias.
Related papers
- Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Data Representativeness in Accessibility Datasets: A Meta-Analysis [7.6597163467929805]
We review datasets sourced by people with disabilities and older adults.
We find that accessibility datasets represent diverse ages, but have gender and race representation gaps.
We hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
arXiv Detail & Related papers (2022-07-16T23:32:19Z) - Understanding Unfairness in Fraud Detection through Model and Data Bias
Interactions [4.159343412286401]
We argue that algorithmic unfairness stems from interactions between models and biases in the data.
We study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings.
arXiv Detail & Related papers (2022-07-13T15:18:30Z) - Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of
Demographic Data Collection in the Pursuit of Fairness [0.0]
We consider calls to collect more data on demographics to enable algorithmic fairness.
We show how these techniques largely ignore broader questions of data governance and systemic oppression.
arXiv Detail & Related papers (2022-04-18T04:50:09Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Representation Bias in Data: A Survey on Identification and Resolution
Techniques [26.142021257838564]
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately.
Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods.
This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later.
arXiv Detail & Related papers (2022-03-22T16:30:22Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - When Fair Ranking Meets Uncertain Inference [5.33312437416961]
We show how demographic inferences drawn from real systems can lead to unfair rankings.
Our results suggest developers should not use inferred demographic data as input to fair ranking algorithms.
arXiv Detail & Related papers (2021-05-05T14:40:07Z) - Measuring Social Biases of Crowd Workers using Counterfactual Queries [84.10721065676913]
Social biases based on gender, race, etc. have been shown to pollute machine learning (ML) pipeline predominantly via biased training datasets.
Crowdsourcing, a popular cost-effective measure to gather labeled training datasets, is not immune to the inherent social biases of crowd workers.
We propose a new method based on counterfactual fairness to quantify the degree of inherent social bias in each crowd worker.
arXiv Detail & Related papers (2020-04-04T21:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.