Reducing Biases in Record Matching Through Scores Calibration
- URL: http://arxiv.org/abs/2411.01685v2
- Date: Wed, 25 Jun 2025 21:36:23 GMT
- Title: Reducing Biases in Record Matching Through Scores Calibration
- Authors: Mohammad Hossein Moslemi, Mostafa Milani,
- Abstract summary: We propose a threshold-independent framework for measuring and reducing score bias.<n>We show that several state-of-the-art matching methods exhibit substantial score bias, even when appearing fair under standard threshold-based metrics.<n>We introduce two post-processing score calibration algorithms. The first, calib, aligns group-wise score distributions using the Wasserstein barycenter, targeting demographic parity.<n>The second, ccalib, conditions on predicted labels to further reduce label-dependent biases, such as equal opportunity.
- Score: 1.5530839016602822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Record matching is the task of identifying records that refer to the same real-world entity across datasets. While most existing models optimize for accuracy, fairness has become an important concern due to the potential for unequal outcomes across demographic groups. Prior work typically focuses on binary outcomes evaluated at fixed decision thresholds. However, such evaluations can miss biases in matching scores--biases that persist across thresholds and affect downstream tasks. We propose a threshold-independent framework for measuring and reducing score bias, defined as disparities in the distribution of matching scores across groups. We show that several state-of-the-art matching methods exhibit substantial score bias, even when appearing fair under standard threshold-based metrics. To address this, we introduce two post-processing score calibration algorithms. The first, calib, aligns group-wise score distributions using the Wasserstein barycenter, targeting demographic parity. The second, ccalib, conditions on predicted labels to further reduce label-dependent biases, such as equal opportunity. Both methods are model-agnostic and require no access to model training data. calib also offers theoretical guarantees, ensuring reduced bias with minimal deviation from original scores. Experiments across real-world datasets and matching models confirm that calib and ccalib substantially reduce score bias while minimally impacting model accuracy.
Related papers
- Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching [0.49157446832511503]
We investigate whether the way training and testing data are sampled affects the reliability of fairness metrics.<n>Since training and test sets are often randomly sampled from the same population, bias present in the training data may still exist in the test data.<n>We propose FairMatch, a post-processing method that applies propensity score matching to evaluate and mitigate bias.
arXiv Detail & Related papers (2025-04-23T19:28:30Z) - Mitigating Spurious Correlations via Disagreement Probability [4.8884049398279705]
Models trained with empirical risk minimization (ERM) are prone to be biased towards spurious correlations between target labels and bias attributes.<n>We introduce a training objective designed to robustly enhance model performance across all data samples.<n>We then derive a debiasing method, Disagreement Probability based Resampling for debiasing (DPR), which does not require bias labels.
arXiv Detail & Related papers (2024-11-04T02:44:04Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Fair-OBNC: Correcting Label Noise for Fairer Datasets [9.427445881721814]
biases in the training data are sometimes related to label noise.
Models trained on such biased data may perpetuate or even aggravate the biases with respect to sensitive information.
We propose Fair-OBNC, a label noise correction method with fairness considerations.
arXiv Detail & Related papers (2024-10-08T17:18:18Z) - Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Score Normalization for Demographic Fairness in Face Recognition [16.421833444307232]
Well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points.
We extend the standard Z/T-norm to integrate demographic information in normalization.
We show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks.
arXiv Detail & Related papers (2024-07-19T07:51:51Z) - Threshold-Independent Fair Matching through Score Calibration [1.5530839016602822]
We introduce a new approach in Entity Matching (EM) using recent metrics for evaluating biases in score based binary classification.
This approach enables the application of various bias metrics like equalized odds, equal opportunity, and demographic parity without depending on threshold settings.
This paper contributes to the field of fairness in data cleaning, especially within EM, by promoting a method for generating matching scores that reduce biases across different thresholds.
arXiv Detail & Related papers (2024-05-30T13:37:53Z) - Systematic analysis of the impact of label noise correction on ML
Fairness [0.0]
We develop an empirical methodology to evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets.
Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness.
arXiv Detail & Related papers (2023-06-28T08:08:14Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Improving Fair Training under Correlation Shifts [33.385118640843416]
In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen.
We analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness.
We propose a novel pre-processing step that samples the input data to reduce correlation shifts.
arXiv Detail & Related papers (2023-02-05T07:23:35Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Improving Evaluation of Debiasing in Image Classification [29.711865666774017]
Our study indicates several issues need to be improved when conducting evaluation of debiasing in image classification.
Based on such issues, this paper proposes an evaluation metric Align-Conflict (AC) score' for the tuning criterion.
We believe our findings and lessons inspire future researchers in debiasing to further push state-of-the-art performances with fair comparisons.
arXiv Detail & Related papers (2022-06-08T05:24:13Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.