Related papers: Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics

Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics

URL: http://arxiv.org/abs/2505.03992v1
Date: Tue, 06 May 2025 22:02:53 GMT
Title: Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics
Authors: Jarren Briscoe, Garrett Kepler, Daryl Deford, Assefaw Gebremedhin,
Abstract summary: We show the significance of sample-size bias in classification metrics.<n>This revelation challenges the efficacy of these metrics in assessing bias with high resolution.<n>We propose a model-agnostic assessment and correction technique.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating machine learning models is crucial not only for determining their technical accuracy but also for assessing their potential societal implications. While the potential for low-sample-size bias in algorithms is well known, we demonstrate the significance of sample-size bias induced by combinatorics in classification metrics. This revelation challenges the efficacy of these metrics in assessing bias with high resolution, especially when comparing groups of disparate sizes, which frequently arise in social applications. We provide analyses of the bias that appears in several commonly applied metrics and propose a model-agnostic assessment and correction technique. Additionally, we analyze counts of undefined cases in metric calculations, which can lead to misleading evaluations if improperly handled. This work illuminates the previously unrecognized challenge of combinatorics and probability in standard evaluation practices and thereby advances approaches for performing fair and trustworthy classification methods.

Related papers

Trustworthy Classification through Rank-Based Conformal Prediction Sets [9.559062601251464]
We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models. Our approach constructs prediction sets that achieve the desired coverage rate while managing their size. Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation.
arXiv Detail & Related papers (2024-07-05T10:43:41Z)
$F_β$-plot -- a visual tool for evaluating imbalanced data classifiers [0.0]
The paper proposes a simple approach to analyzing the popular parametric metric $F_beta$. It is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
arXiv Detail & Related papers (2024-04-11T18:07:57Z)
Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can! [0.30693357740321775]
We show in a systematic literature review that communication scholars largely ignore misclassification bias. Existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels.
arXiv Detail & Related papers (2023-07-12T23:03:55Z)
Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks. We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations. We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)
Information-Theoretic Bias Reduction via Causal View of Spurious Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation. We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss. The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z)
Measure Twice, Cut Once: Quantifying Bias and Fairness in Deep Neural Networks [7.763173131630868]
We propose two metrics to quantitatively evaluate the class-wise bias of two models in comparison to one another. By evaluating the performance of these new metrics and by demonstrating their practical application, we show that they can be used to measure fairness as well as bias.
arXiv Detail & Related papers (2021-10-08T22:35:34Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Classifier uncertainty: evidence, potential impact, and probabilistic treatment [0.0]
We present an approach to quantify the uncertainty of classification performance metrics based on a probability model of the confusion matrix. We show that uncertainties can be surprisingly large and limit performance evaluation.
arXiv Detail & Related papers (2020-06-19T12:49:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.