A Reduction to Binary Approach for Debiasing Multiclass Datasets
- URL: http://arxiv.org/abs/2205.15860v1
- Date: Tue, 31 May 2022 15:11:41 GMT
- Title: A Reduction to Binary Approach for Debiasing Multiclass Datasets
- Authors: Ibrahim Alabdulmohsin and Jessica Schrouff and Oluwasanmi Koyejo
- Abstract summary: We prove that R2B satisfies optimality and bias guarantees and demonstrate empirically that it can lead to an improvement over two baselines.
We validate these conclusions on synthetic and real-world datasets from social science, computer vision, and healthcare.
- Score: 12.885756277367443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel reduction-to-binary (R2B) approach that enforces
demographic parity for multiclass classification with non-binary sensitive
attributes via a reduction to a sequence of binary debiasing tasks. We prove
that R2B satisfies optimality and bias guarantees and demonstrate empirically
that it can lead to an improvement over two baselines: (1) treating multiclass
problems as multi-label by debiasing labels independently and (2) transforming
the features instead of the labels. Surprisingly, we also demonstrate that
independent label debiasing yields competitive results in most (but not all)
settings. We validate these conclusions on synthetic and real-world datasets
from social science, computer vision, and healthcare.
Related papers
- A Debiased Nearest Neighbors Framework for Multi-Label Text Classification [13.30576550077694]
We introduce a DEbiased Nearest Neighbors (DENN) framework for Multi-Label Text Classification (MLTC)
To address embedding alignment bias, we propose a debiased contrastive learning strategy, enhancing neighbor consistency on label co-occurrence.
For confidence estimation bias, we present a debiased confidence estimation strategy, improving the adaptive combination of predictions from $k$NN and inductive binary classifications.
arXiv Detail & Related papers (2024-08-06T14:00:23Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Robust Representation Learning for Unreliable Partial Label Learning [86.909511808373]
Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth.
This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels.
We propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively.
arXiv Detail & Related papers (2023-08-31T13:37:28Z) - GaussianMLR: Learning Implicit Class Significance via Calibrated
Multi-Label Ranking [0.0]
We propose a novel multi-label ranking method: GaussianMLR.
It aims to learn implicit class significance values that determine the positive label ranks.
We show that our method is able to accurately learn a representation of the incorporated positive rank order.
arXiv Detail & Related papers (2023-03-07T14:09:08Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - One-vs.-One Mitigation of Intersectional Bias: A General Method to
Extend Fairness-Aware Binary Classification [0.48733623015338234]
One-vs.-One Mitigation is a process of comparison between each pair of subgroups related to sensitive attributes to the fairness-aware machine learning for binary classification.
Our method mitigates the intersectional bias much better than conventional methods in all the settings.
arXiv Detail & Related papers (2020-10-26T11:35:39Z) - Pointwise Binary Classification with Pairwise Confidence Comparisons [97.79518780631457]
We propose pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other.
We link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization.
arXiv Detail & Related papers (2020-10-05T09:23:58Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.