Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances
- URL: http://arxiv.org/abs/2502.02623v1
- Date: Tue, 04 Feb 2025 14:03:49 GMT
- Title: Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances
- Authors: German Martinez Matilla, Jakub Marecek,
- Abstract summary: Sample complexity of bias estimation is a lower bound on the runtime of any bias detection method.<n>We reformulate bias detection as a point-to-subspace problem on the space of measures and show that, for supremum norm, it can be subsampled efficiently.
- Score: 2.7624021966289596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sample complexity of bias estimation is a lower bound on the runtime of any bias detection method. Many regulatory frameworks require the bias to be tested for all subgroups, whose number grows exponentially with the number of protected attributes. Unless one wishes to run a bias detection with a doubly-exponential run-time, one should like to have polynomial complexity of bias detection for a single subgroup. At the same time, the reference data may be based on surveys, and thus come with non-trivial uncertainty. Here, we reformulate bias detection as a point-to-subspace problem on the space of measures and show that, for supremum norm, it can be subsampled efficiently. In particular, our probabilistically approximately correct (PAC) results are corroborated by tests on well-known instances.
Related papers
- Size-adaptive Hypothesis Testing for Fairness [8.315080617799445]
We introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision.<n>We prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $alpha$.<n>For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator.
arXiv Detail & Related papers (2025-06-12T11:22:09Z) - Bias Detection via Maximum Subgroup Discrepancy [2.236957801565796]
We propose a new notion of distance, the Maximum Subgroup Discrepancy (MSD)<n>In this metric, two distributions are close if, roughly, discrepancies are low for all feature subgroups.<n>We provide a practical algorithm for the evaluation of the distance, based on Mixed-integer optimization (MIO)
arXiv Detail & Related papers (2025-02-04T11:01:03Z) - On Socially Fair Low-Rank Approximation and Column Subset Selection [62.44413238556872]
Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications.
We show that surprisingly, even constant-factor approximation fair low-rank approximation requires exponential time under certain standard complexity hypotheses.
We give an algorithm for fair low-rank approximation that, for a constant number of groups and constant-factor accuracy, runs in $2textpoly(k)$ time rather than the na"ive $ntextpoly(k)$.
arXiv Detail & Related papers (2024-12-08T20:34:16Z) - Multi-Normal Prototypes Learning for Weakly Supervised Anomaly Detection [9.4765288592895]
Anomaly detection is a crucial task in various domains.<n>Most of the existing methods assume the normal sample data clusters around a single central prototype.<n>We propose a novel anomaly detection framework that can efficiently work with limited labeled anomalies.
arXiv Detail & Related papers (2024-08-23T18:27:58Z) - Generalization error of min-norm interpolators in transfer learning [2.7309692684728617]
Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms.
In many applications, a limited amount of test data may be available during training, yet properties of min-norm in this setting are not well-understood.
We establish a novel anisotropic local law to achieve these characterizations.
arXiv Detail & Related papers (2024-06-20T02:23:28Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Set Features for Anomaly Detection [44.76513792571765]
We propose set features that model each sample by the distribution of its elements.
We compute the anomaly score of each sample using a simple density estimation method.
Our approach outperforms the previous state-of-the-art in image-level logical anomaly detection and sequence-level time series anomaly detection.
arXiv Detail & Related papers (2023-11-24T13:56:06Z) - IBADR: an Iterative Bias-Aware Dataset Refinement Framework for
Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework.
We first train a shallow model to quantify the bias degree of samples in the pool.
Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator.
In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z) - Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of
Biases-Specific Experts [24.055919128977195]
We present the Partition-and-Debias (PnD) method that uses a mixture of biases-specific experts to implicitly divide the bias space into multiple subspaces.
Experiments on both public and constructed benchmarks demonstrated the efficacy of the PnD.
arXiv Detail & Related papers (2023-08-19T13:11:40Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection [78.734927709231]
Anomaly detectors are widely used in industrial manufacturing to detect and localize unknown defects in query images.<n>These detectors are trained on anomaly-free samples and have successfully distinguished anomalies from most normal samples.<n>However, hard-normal examples are scattered and far apart from most normal samples, and thus they are often mistaken for anomalies by existing methods.
arXiv Detail & Related papers (2023-03-28T17:54:56Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Projected Sliced Wasserstein Autoencoder-based Hyperspectral Images
Anomaly Detection [42.585075865267946]
We propose the Projected Sliced Wasserstein (PSW) autoencoder-based anomaly detection method.
In particular, the computation-friendly eigen-decomposition method is leveraged to find the principal component for slicing the high-dimensional data.
Comprehensive experiments conducted on various real-world hyperspectral anomaly detection benchmarks demonstrate the superior performance of the proposed method.
arXiv Detail & Related papers (2021-12-20T09:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.