Related papers: Investigating the Effects of Fairness Interventions Using Pointwise Representational Similarity

Investigating the Effects of Fairness Interventions Using Pointwise Representational Similarity

URL: http://arxiv.org/abs/2305.19294v2
Date: Thu, 22 May 2025 11:00:27 GMT
Title: Investigating the Effects of Fairness Interventions Using Pointwise Representational Similarity
Authors: Camila Kolling, Till Speicher, Vedant Nanda, Mariya Toneva, Krishna P. Gummadi,
Abstract summary: We introduce Pointwise Normalized Kernel Alignment (PNKA), a pointwise representational similarity measure.<n>PNKA reveals previously unknown insights by measuring how debiasing measures affect the intermediate representations of individuals.<n>We show that by evaluating representations using PNKA, we can reliably predict the behavior of ML models trained on these representations.
Score: 12.879768345296718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) algorithms can often exhibit discriminatory behavior, negatively affecting certain populations across protected groups. To address this, numerous debiasing methods, and consequently evaluation measures, have been proposed. Current evaluation measures for debiasing methods suffer from two main limitations: (1) they primarily provide a global estimate of unfairness, failing to provide a more fine-grained analysis, and (2) they predominantly analyze the model output on a specific task, failing to generalize the findings to other tasks. In this work, we introduce Pointwise Normalized Kernel Alignment (PNKA), a pointwise representational similarity measure that addresses these limitations by measuring how debiasing measures affect the intermediate representations of individuals. On tabular data, the use of PNKA reveals previously unknown insights: while group fairness predominantly influences a small subset of the population, maintaining high representational similarity for the majority, individual fairness constraints uniformly impact representations across the entire population, altering nearly every data point. We show that by evaluating representations using PNKA, we can reliably predict the behavior of ML models trained on these representations. Moreover, applying PNKA to language embeddings shows that existing debiasing methods may not perform as intended, failing to remove biases from stereotypical words and sentences. Our findings suggest that current evaluation measures for debiasing methods are insufficient, highlighting the need for a deeper understanding of the effects of debiasing methods, and show how pointwise representational similarity metrics can help with fairness audits.

Related papers

EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition [49.27067541740956]
EMO-Debias is a large-scale comparison of 13 debiasing methods applied to multi-label SER.<n>Our study encompasses techniques from pre-processing, regularization, adversarial learning, biased learners, and distributionally robust optimization.<n>Our analysis quantifies the trade-offs between fairness and accuracy, identifying which approaches consistently reduce gender performance gaps.
arXiv Detail & Related papers (2025-06-05T05:48:31Z)
Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs [25.62533031580287]
Bias in Large Language Models (LLMs) significantly undermines their reliability and fairness.<n>We propose BiasLens, a test-set-free bias analysis framework based on the structure of the model's vector space.
arXiv Detail & Related papers (2025-05-21T13:50:23Z)
ALVIN: Active Learning Via INterpolation [44.410677121415695]
Active Learning Via INterpolation (ALVIN) conducts intra-class generalizations between examples from under-represented and well-represented groups. ALVIN identifies informative examples exposing the model to regions of the representation space that counteract the influence of shortcuts. Experimental results on six datasets encompassing sentiment analysis, natural language inference, and paraphrase detection demonstrate that ALVIN outperforms state-of-the-art active learning methods.
arXiv Detail & Related papers (2024-10-11T16:44:39Z)
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity [20.17288970927518]
We study the similarity of representations between the hidden layers of individual transformers. We show that representations across layers are positively correlated, with similarity increasing when layers get closer. We propose an aligned training method to improve the effectiveness of shallow layer.
arXiv Detail & Related papers (2024-06-20T16:41:09Z)
Does Machine Bring in Extra Bias in Learning? Approximating Fairness in Models Promptly [2.002741592555996]
Existing techniques for assessing the discrimination level of machine learning models include commonly used group and individual fairness measures. We propose a "harmonic fairness measure via manifold (HFM)" based on distances between sets. Empirical results indicate that the proposed fairness measure HFM is valid and that the proposed ApproxDist is effective and efficient.
arXiv Detail & Related papers (2024-05-15T11:07:40Z)
Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric [44.95433989446052]
We show the benefit of our proposed method through a new understanding of the contrastive loss of CLIP. We show that our proposed similarity based on weighted point clouds consistently achieves the optimal similarity.
arXiv Detail & Related papers (2024-04-30T03:15:04Z)
Addressing Both Statistical and Causal Gender Fairness in NLP Models [22.75594773147521]
Statistical fairness stipulates equivalent outcomes for every protected group, whereas causal fairness prescribes that a model makes the same prediction for an individual regardless of their protected characteristics. We demonstrate that combinations of statistical and causal debiasing techniques are able to reduce bias measured through both types of metrics.
arXiv Detail & Related papers (2024-03-30T20:05:41Z)
Towards out-of-distribution generalization in large-scale astronomical surveys: robust networks learn similar representations [3.653721769378018]
We use Centered Kernel Alignment (CKA), a similarity measure metric of neural network representations, to examine the relationship between representation similarity and performance. We find that when models are robust to a distribution shift, they produce substantially different representations across their layers on OOD data. We discuss the potential application of similarity representation in guiding model design, training strategy, and mitigating the OOD problem by incorporating CKA as an inductive bias during training.
arXiv Detail & Related papers (2023-11-29T19:00:05Z)
Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates. We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z)
When mitigating bias is unfair: multiplicity and arbitrariness in algorithmic group fairness [8.367620276482056]
We introduce the FRAME (FaiRness Arbitrariness and Multiplicity Evaluation) framework, which evaluates bias mitigation through five dimensions. Applying FRAME to various bias mitigation approaches across key datasets allows us to exhibit significant differences in the behaviors of debiasing methods. These findings highlight the limitations of current fairness criteria and the inherent arbitrariness in the debiasing process.
arXiv Detail & Related papers (2023-02-14T16:53:52Z)
Beyond Instance Discrimination: Relation-aware Contrastive Self-supervised Learning [75.46664770669949]
We present relation-aware contrastive self-supervised learning (ReCo) to integrate instance relations. Our ReCo consistently gains remarkable performance improvements.
arXiv Detail & Related papers (2022-11-02T03:25:28Z)
Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing [97.70862116338554]
We investigate the problem of measuring interpretability of self-supervised representations. We formulate the latter as estimating the mutual information between the representation and a space of manually labelled concepts. We use our method to evaluate a large number of self-supervised representations, ranking them by interpretability.
arXiv Detail & Related papers (2022-09-07T16:18:50Z)
Not All Instances Contribute Equally: Instance-adaptive Class Representation Learning for Few-Shot Visual Recognition [94.04041301504567]
Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances. We propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.
arXiv Detail & Related papers (2022-09-07T10:00:18Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Information-Theoretic Bias Reduction via Causal View of Spurious Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation. We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss. The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z)
Information-Theoretic Bias Assessment Of Learned Representations Of Pretrained Face Recognition [18.07966649678408]
We propose an information-theoretic, independent bias assessment metric to identify degree of bias against protected demographic attributes. Our metric differs from other methods that rely on classification accuracy or examine the differences between ground truth and predicted labels of protected attributes predicted using a shallow network.
arXiv Detail & Related papers (2021-11-08T17:41:17Z)
Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions. Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise. We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z)
Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes. We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Instance Similarity Learning for Unsupervised Feature Representation [83.31011038813459]
We propose an instance similarity learning (ISL) method for unsupervised feature representation. We employ the Generative Adversarial Networks (GAN) to mine the underlying feature manifold. Experiments on image classification demonstrate the superiority of our method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-05T16:42:06Z)
Toward Scalable and Unified Example-based Explanation and Outlier Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction. We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.