Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification
- URL: http://arxiv.org/abs/2602.13286v1
- Date: Sat, 07 Feb 2026 13:41:42 GMT
- Title: Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification
- Authors: Nathanya Satriani, Djordje Slijepčević, Markus Schedl, Matthias Zeppelzauer,
- Abstract summary: Explanatory interactive learning (XIL) enables users to guide model training in machine learning (ML) by providing feedback on the model's explanations.<n>We investigate two state-of-the-art XIL strategies, i.e., CAIPI and Right for the Right Reasons (Bounded), as well as a novel hybrid approach that combines both strategies.<n> Experimental results demonstrate the effectiveness of these methods in guiding ML models to focus on relevant image features.
- Score: 6.296044623811203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explanatory interactive learning (XIL) enables users to guide model training in machine learning (ML) by providing feedback on the model's explanations, thereby helping it to focus on features that are relevant to the prediction from the user's perspective. In this study, we explore the capability of this learning paradigm to mitigate bias and spurious correlations in visual classifiers, specifically in scenarios prone to data bias, such as gender classification. We investigate two methodologically different state-of-the-art XIL strategies, i.e., CAIPI and Right for the Right Reasons (RRR), as well as a novel hybrid approach that combines both strategies. The results are evaluated quantitatively by comparing segmentation masks with explanations generated using Gradient-weighted Class Activation Mapping (GradCAM) and Bounded Logit Attention (BLA). Experimental results demonstrate the effectiveness of these methods in (i) guiding ML models to focus on relevant image features, particularly when CAIPI is used, and (ii) reducing model bias (i.e., balancing the misclassification rates between male and female predictions). Our analysis further supports the potential of XIL methods to improve fairness in gender classifiers. Overall, the increased transparency and fairness obtained by XIL leads to slight performance decreases with an exception being CAIPI, which shows potential to even improve classification accuracy.
Related papers
- Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation [61.248535801314375]
Subset-Selected Counterfactual Augmentation (SS-CA)<n>We develop Counterfactual LIMA to identify minimal spatial region sets whose removal can selectively alter model predictions.<n>Experiments show that SS-CA improves generalization on in-distribution (ID) test data and achieves superior performance on out-of-distribution (OOD) benchmarks.
arXiv Detail & Related papers (2025-11-15T08:39:22Z) - Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check [60.77691669644931]
We propose Functional Alignment for Distributional Equivalence (FADE), a novel metric that measures distributional similarity between unlearned and reference models.<n>We show that FADE captures functional alignment across the entire output distribution, providing a principled assessment of genuine unlearning.<n>These findings expose fundamental gaps in current evaluation practices and demonstrate that FADE provides a more robust foundation for developing and assessing truly effective unlearning methods.
arXiv Detail & Related papers (2025-10-14T20:50:30Z) - Addressing Class Imbalance with Probabilistic Graphical Models and Variational Inference [10.457756074328664]
This study proposes a method for imbalanced data classification based on deep probabilistic graphical models (DPGMs)<n>We introduce variational inference optimization probability modeling, which enables the model to adaptively adjust the representation ability of minority classes.<n>We combine the adversarial learning mechanism to generate minority class samples in the latent space so that the model can better characterize the category boundary.
arXiv Detail & Related papers (2025-04-08T07:38:30Z) - Gradient Extrapolation for Debiased Representation Learning [7.183424522250937]
Gradient Extrapolation for Debiased Representation Learning (GERNE) is designed to learn debiased representations in both known and unknown attribute training cases.<n>Our analysis shows that when the extrapolated gradient points toward the batch gradient with fewer spurious correlations, it effectively guides training toward learning a debiased model.
arXiv Detail & Related papers (2025-03-17T14:48:57Z) - Detecting and Mitigating Algorithmic Bias in Binary Classification using
Causal Modeling [0.0]
We show that gender bias in the prediction model is statistically significant at the 0.05 level.
We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation.
Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R.
arXiv Detail & Related papers (2023-10-19T02:21:04Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Toward Fair Facial Expression Recognition with Improved Distribution
Alignment [19.442685015494316]
We present a novel approach to mitigate bias in facial expression recognition (FER) models.
Our method aims to reduce sensitive attribute information such as gender, age, or race, in the embeddings produced by FER models.
For the first time, we analyze the notion of attractiveness as an important sensitive attribute in FER models and demonstrate that FER models can indeed exhibit biases towards more attractive faces.
arXiv Detail & Related papers (2023-06-11T14:59:20Z) - D-CALM: A Dynamic Clustering-based Active Learning Approach for
Mitigating Bias [13.008323851750442]
In this paper, we propose a novel adaptive clustering-based active learning algorithm, D-CALM, that dynamically adjusts clustering and annotation efforts.
Experiments on eight datasets for a diverse set of text classification tasks, including emotion, hatespeech, dialog act, and book type detection, demonstrate that our proposed algorithm significantly outperforms baseline AL approaches.
arXiv Detail & Related papers (2023-05-26T15:17:43Z) - Learning disentangled representations for explainable chest X-ray
classification using Dirichlet VAEs [68.73427163074015]
This study explores the use of the Dirichlet Variational Autoencoder (DirVAE) for learning disentangled latent representations of chest X-ray (CXR) images.
The predictive capacity of multi-modal latent representations learned by DirVAE models is investigated through implementation of an auxiliary multi-label classification task.
arXiv Detail & Related papers (2023-02-06T18:10:08Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.