Related papers: Correcting Underrepresentation and Intersectional Bias for Classification

Correcting Underrepresentation and Intersectional Bias for Classification

URL: http://arxiv.org/abs/2306.11112v4
Date: Mon, 3 Jun 2024 20:57:56 GMT
Title: Correcting Underrepresentation and Intersectional Bias for Classification
Authors: Emily Diana, Alexander Williams Tolbert,
Abstract summary: We consider the problem of learning from data corrupted by underrepresentation bias. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates. We show that our algorithm permits efficient learning for model classes of finite VC dimension.
Score: 49.1574468325115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using these estimates, we construct a reweighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. From this, we present an algorithm encapsulating this learning and reweighting process along with a thorough empirical investigation. Finally, we define a bespoke notion of PAC learnability for the underrepresentation and intersectional bias setting and show that our algorithm permits efficient learning for model classes of finite VC dimension.

Related papers

Size-adaptive Hypothesis Testing for Fairness [8.315080617799445]
We introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision.<n>We prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $alpha$.<n>For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator.
arXiv Detail & Related papers (2025-06-12T11:22:09Z)
Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions. We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Investigating the Effects of Fairness Interventions Using Pointwise Representational Similarity [12.879768345296718]
We introduce Pointwise Normalized Kernel Alignment (PNKA), a pointwise representational similarity measure.<n>PNKA reveals previously unknown insights by measuring how debiasing measures affect the intermediate representations of individuals.<n>We show that by evaluating representations using PNKA, we can reliably predict the behavior of ML models trained on these representations.
arXiv Detail & Related papers (2023-05-30T09:40:08Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Learning to Split for Automatic Bias Detection [39.353850990332525]
Learning to Split (ls) is an algorithm for automatic bias detection. We evaluate our approach on Beer Review, CelebA and MNLI.
arXiv Detail & Related papers (2022-04-28T19:41:08Z)
Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group. We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z)
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Learning Unbiased Representations via Mutual Information Backpropagation [36.383338079229695]
In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties. We propose a novel end-to-end optimization strategy, which simultaneously estimates and minimizes the mutual information between the learned representation and the data attributes.
arXiv Detail & Related papers (2020-03-13T18:06:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.