Unbiased Subdata Selection for Fair Classification: A Unified Framework
and Scalable Algorithms
- URL: http://arxiv.org/abs/2012.12356v2
- Date: Thu, 24 Dec 2020 05:20:18 GMT
- Title: Unbiased Subdata Selection for Fair Classification: A Unified Framework
and Scalable Algorithms
- Authors: Qing Ye and Weijun Xie
- Abstract summary: We show that many classification models within this framework can be recast as mixed-integer convex programs.
We then show that in the proposed problem, when the classification outcomes, "unsolvable subdata selection," is strongly-solvable.
This motivates us to develop an iterative refining strategy (IRS) to solve the classification instances.
- Score: 0.8376091455761261
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As an important problem in modern data analytics, classification has
witnessed varieties of applications from different domains. Different from
conventional classification approaches, fair classification concerns the issues
of unintentional biases against the sensitive features (e.g., gender, race).
Due to high nonconvexity of fairness measures, existing methods are often
unable to model exact fairness, which can cause inferior fair classification
outcomes. This paper fills the gap by developing a novel unified framework to
jointly optimize accuracy and fairness. The proposed framework is versatile and
can incorporate different fairness measures studied in literature precisely as
well as can be applicable to many classifiers including deep classification
models. Specifically, in this paper, we first prove Fisher consistency of the
proposed framework. We then show that many classification models within this
framework can be recast as mixed-integer convex programs, which can be solved
effectively by off-the-shelf solvers when the instance sizes are moderate and
can be used as benchmarks to compare the efficiency of approximation
algorithms. We prove that in the proposed framework, when the classification
outcomes are known, the resulting problem, termed "unbiased subdata selection,"
is strongly polynomial-solvable and can be used to enhance the classification
fairness by selecting more representative data points. This motivates us to
develop an iterative refining strategy (IRS) to solve the large-scale
instances, where we improve the classification accuracy and conduct the
unbiased subdata selection in an alternating fashion. We study the convergence
property of IRS and derive its approximation bound. More broadly, this
framework can be leveraged to improve classification models with unbalanced
data by taking F1 score into consideration.
Related papers
- Optimal Group Fair Classifiers from Linear Post-Processing [10.615965454674901]
We propose a post-processing algorithm for fair classification that mitigates model bias under a unified family of group fairness criteria.
It achieves fairness by re-calibrating the output score of the given base model with a "fairness cost" -- a linear combination of the (predicted) group memberships.
arXiv Detail & Related papers (2024-05-07T05:58:44Z) - Balanced Classification: A Unified Framework for Long-Tailed Object
Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories.
We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution.
BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Fair and Optimal Classification via Post-Processing [10.163721748735801]
This paper provides a complete characterization of the inherent tradeoff of demographic parity on classification problems.
We show that the minimum error rate achievable by randomized and attribute-aware fair classifiers is given by the optimal value of a Wasserstein-barycenter problem.
arXiv Detail & Related papers (2022-11-03T00:04:04Z) - Exploring Category-correlated Feature for Few-shot Image Classification [27.13708881431794]
We present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge.
The proposed approach consistently obtains considerable performance gains on three widely used benchmarks.
arXiv Detail & Related papers (2021-12-14T08:25:24Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Fairness with Overlapping Groups [15.154984899546333]
A standard goal is to ensure the equality of fairness metrics across multiple overlapping groups simultaneously.
We reconsider this standard fair classification problem using a probabilistic population analysis.
Our approach unifies a variety of existing group-fair classification methods and enables extensions to a wide range of non-decomposable multiclass performance metrics and fairness measures.
arXiv Detail & Related papers (2020-06-24T05:01:10Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.