Leveraging Structure for Improved Classification of Grouped Biased Data
- URL: http://arxiv.org/abs/2212.03697v1
- Date: Wed, 7 Dec 2022 15:18:21 GMT
- Title: Leveraging Structure for Improved Classification of Grouped Biased Data
- Authors: Daniel Zeiberg, Shantanu Jain, Predrag Radivojac
- Abstract summary: We consider semi-supervised binary classification for applications in which data points are naturally grouped.
We derive a semi-supervised algorithm that explicitly leverages the structure to learn an optimal, group-aware, probability-outputd classifier.
- Score: 8.121462458089143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider semi-supervised binary classification for applications in which
data points are naturally grouped (e.g., survey responses grouped by state) and
the labeled data is biased (e.g., survey respondents are not representative of
the population). The groups overlap in the feature space and consequently the
input-output patterns are related across the groups. To model the inherent
structure in such data, we assume the partition-projected class-conditional
invariance across groups, defined in terms of the group-agnostic feature space.
We demonstrate that under this assumption, the group carries additional
information about the class, over the group-agnostic features, with provably
improved area under the ROC curve. Further assuming invariance of
partition-projected class-conditional distributions across both labeled and
unlabeled data, we derive a semi-supervised algorithm that explicitly leverages
the structure to learn an optimal, group-aware, probability-calibrated
classifier, despite the bias in the labeled data. Experiments on synthetic and
real data demonstrate the efficacy of our algorithm over suitable baselines and
ablative models, spanning standard supervised and semi-supervised learning
approaches, with and without incorporating the group directly as a feature.
Related papers
- A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Group-blind optimal transport to group parity and its constrained variants [6.70948761466883]
We design a single group-blind projection map that aligns the feature distributions of both groups in the source data.
We assume that the source data are unbiased representation of the population.
We present numerical results on synthetic data and real data.
arXiv Detail & Related papers (2023-10-17T17:14:07Z) - Affinity Clustering Framework for Data Debiasing Using Pairwise
Distribution Discrepancy [10.184056098238765]
Group imbalance, resulting from inadequate or unrepresentative data collection methods, is a primary cause of representation bias in datasets.
This paper presents MASC, a data augmentation approach that leverages affinity clustering to balance the representation of non-protected and protected groups of a target dataset.
arXiv Detail & Related papers (2023-06-02T17:18:20Z) - Outlier-Robust Group Inference via Gradient Space Clustering [50.87474101594732]
Existing methods can improve the worst-group performance, but they require group annotations, which are often expensive and sometimes infeasible to obtain.
We address the problem of learning group annotations in the presence of outliers by clustering the data in the space of gradients of the model parameters.
We show that data in the gradient space has a simpler structure while preserving information about minority groups and outliers, making it suitable for standard clustering methods like DBSCAN.
arXiv Detail & Related papers (2022-10-13T06:04:43Z) - Addressing Missing Sources with Adversarial Support-Matching [8.53946780558779]
We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data.
Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"
We make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup.
arXiv Detail & Related papers (2022-03-24T16:19:19Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.