Related papers: Group-matching algorithms for subjects and items

Group-matching algorithms for subjects and items

URL: http://arxiv.org/abs/2110.04432v1
Date: Sat, 9 Oct 2021 02:44:31 GMT
Title: Group-matching algorithms for subjects and items
Authors: G\'eza Kiss and Kyle Gorman and Jan P.H. van Santen
Abstract summary: We consider the problem of constructing matched groups such that the resulting groups are statistically similar with respect to their average values. We show that the ldamatch package produces high-quality matches using artificial and real-world data sets.
Score: 6.739368462094944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of constructing matched groups such that the resulting groups are statistically similar with respect to their average values for multiple covariates. This group-matching problem arises in many cases, including quasi-experimental and observational studies in which subjects or items are sampled from pre-existing groups, scenarios in which traditional pair-matching approaches may be inappropriate. We consider the case in which one is provided with an existing sample and iteratively eliminates samples so that the groups "match" according to arbitrary statistically-defined criteria. This problem is NP-hard. However, using artificial and real-world data sets, we show that heuristics implemented by the ldamatch package produce high-quality matches.

Related papers

Size-adaptive Hypothesis Testing for Fairness [8.315080617799445]
We introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision.<n>We prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $alpha$.<n>For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator.
arXiv Detail & Related papers (2025-06-12T11:22:09Z)
Towards Fair Representation: Clustering and Consensus [1.7243216387069678]
We find a consensus clustering that is not only representative but also fair with respect to specific protected attributes.<n>As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness.<n>We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms.
arXiv Detail & Related papers (2025-06-10T10:33:21Z)
Interpretable Clustering with the Distinguishability Criterion [0.4419843514606336]
We present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures. We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
arXiv Detail & Related papers (2024-04-24T16:38:15Z)
A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups. We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z)
Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing [24.553384023323332]
We propose an approach to test for performance disparities based on Conditional Value-at-Risk. We show that the sample complexity required for discovering performance violations is reduced exponentially to be at most upper bounded by the square root of the number of groups.
arXiv Detail & Related papers (2023-12-06T19:25:32Z)
Statistical Performance Guarantee for Subgroup Identification with Generic Machine Learning [1.0878040851638]
We develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES) We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders.
arXiv Detail & Related papers (2023-10-12T01:41:47Z)
Concomitant Group Testing [49.50984893039441]
We introduce a variation of the group testing problem capturing the idea that a positive test requires a combination of multiple types'' of item. The goal is to reliably identify all of the semi-defective sets using as few tests as possible. Our algorithms are distinguished by (i) whether they are deterministic (zero-error) or randomized (small-error), and (ii) whether they are non-adaptive, fully adaptive, or have limited adaptivity.
arXiv Detail & Related papers (2023-09-08T09:11:12Z)
HiPerformer: Hierarchically Permutation-Equivariant Transformer for Time Series Forecasting [56.95572957863576]
We propose a hierarchically permutation-equivariant model that considers both the relationship among components in the same group and the relationship among groups. The experiments conducted on real-world data demonstrate that the proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2023-05-14T05:11:52Z)
Beyond Adult and COMPAS: Fairness in Multi-Class Prediction [8.405162568925405]
We formulate this problem in terms of "projecting" a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements. We provide a parallelizable iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. We also evaluate our method at scale on an open dataset with multiple classes, multiple intersectional protected groups, and over 1M samples.
arXiv Detail & Related papers (2022-06-15T20:29:33Z)
Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions. We propose an algorithm that optimize for the worst-off group assignments from a constraint set. We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
Group Testing with Non-identical Infection Probabilities [59.96266198512243]
We develop an adaptive group testing algorithm using the set formation method. We show that our algorithm outperforms the state of the art, and performs close to the entropy lower bound.
arXiv Detail & Related papers (2021-08-27T17:53:25Z)
Testing Group Fairness via Optimal Transport Projections [12.972104025246091]
The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are to the perturbation or due to the randomness in the data. The statistical challenges, which may arise from multiple impact criteria that define group fairness, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models. The proposed framework can also be used to test for testing composite intrinsic fairness hypotheses and fairness with multiple sensitive attributes.
arXiv Detail & Related papers (2021-06-02T10:51:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.