Related papers: Task-tailored Pre-processing: Fair Downstream Supervised Learning

Task-tailored Pre-processing: Fair Downstream Supervised Learning

URL: http://arxiv.org/abs/2601.11897v1
Date: Sat, 17 Jan 2026 03:49:50 GMT
Title: Task-tailored Pre-processing: Fair Downstream Supervised Learning
Authors: Jinwon Sohn, Guang Lin, Qifan Song,
Abstract summary: We study algorithmic fairness for supervised learning and argue that the data fairness approaches impose overly strong regularization.<n>This motivates us to devise a novel pre-processing approach tailored to supervised learning.
Score: 14.038820621511588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fairness-aware machine learning has recently attracted various communities to mitigate discrimination against certain societal groups in data-driven tasks. For fair supervised learning, particularly in pre-processing, there have been two main categories: data fairness and task-tailored fairness. The former directly finds an intermediate distribution among the groups, independent of the type of the downstream model, so a learned downstream classification/regression model returns similar predictive scores to individuals inputting the same covariates irrespective of their sensitive attributes. The latter explicitly takes the supervised learning task into account when constructing the pre-processing map. In this work, we study algorithmic fairness for supervised learning and argue that the data fairness approaches impose overly strong regularization from the perspective of the HGR correlation. This motivates us to devise a novel pre-processing approach tailored to supervised learning. We account for the trade-off between fairness and utility in obtaining the pre-processing map. Then we study the behavior of arbitrary downstream supervised models learned on the transformed data to find sufficient conditions to guarantee their fairness improvement and utility preservation. To our knowledge, no prior work in the branch of task-tailored methods has theoretically investigated downstream guarantees when using pre-processed data. We further evaluate our framework through comparison studies based on tabular and image data sets, showing the superiority of our framework which preserves consistent trade-offs among multiple downstream models compared to recent competing models. Particularly for computer vision data, we see our method alters only necessary semantic features related to the central machine learning task to achieve fairness.

Related papers

Simulating Biases for Interpretable Fairness in Offline and Online Classifiers [0.35998666903987897]
Mitigation methods are critical to ensure that model outcomes are adjusted to be fair.<n>We develop a framework for synthetic dataset generation with controllable bias injection.<n>In experiments, both offline and online learning approaches are employed.
arXiv Detail & Related papers (2025-07-14T11:04:24Z)
Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks [8.616743904155419]
We propose a framework that integrates sufficient dimension reduction with deep learning to construct fair and informative representations.<n>By introducing a novel penalty term during fine-tuning, our method enforces conditional independence between sensitive attributes and learned representations.<n>Our approach achieves a superior balance between fairness and utility, significantly outperforming state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-08T22:24:22Z)
Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning [44.91863420044712]
In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data. We introduce 1) the adaptive synchronizing marginal loss which imposes class-specific negative margins to alleviate the model bias towards seen classes, and 2) the pseudo-label contrastive clustering which exploits pseudo-labels predicted by the model to group unlabeled data from the same category together. Our method balances the learning pace between seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-21T09:44:39Z)
Model Debiasing via Gradient-based Explanation on Representation [14.673988027271388]
We propose a novel fairness framework that performs debiasing with regard to sensitive attributes and proxy attributes. Our framework achieves better fairness-accuracy trade-off on unstructured and structured datasets than previous state-of-the-art approaches.
arXiv Detail & Related papers (2023-05-20T11:57:57Z)
Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness. We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks. Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z)
DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z)
Improving Fair Training under Correlation Shifts [33.385118640843416]
In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We propose a novel pre-processing step that samples the input data to reduce correlation shifts.
arXiv Detail & Related papers (2023-02-05T07:23:35Z)
An Operational Perspective to Fairness Interventions: Where and How to Intervene [9.833760837977222]
We present a holistic framework for evaluating and contextualizing fairness interventions. We demonstrate our framework with a case study on predictive parity. We find predictive parity is difficult to achieve without using group data.
arXiv Detail & Related papers (2023-02-03T07:04:33Z)
Fair Inference for Discrete Latent Variable Models [12.558187319452657]
Machine learning models, trained on data without due care, often exhibit unfair and discriminatory behavior against certain populations. We develop a fair variational inference technique for the discrete latent variables, which is accomplished by including a fairness penalty on the variational distribution. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a special-purpose graphical model for criminal justice risk assessments.
arXiv Detail & Related papers (2022-09-15T04:54:21Z)
Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data. We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z)
Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions. Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise. We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data. A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.