Balanced Filtering via Disclosure-Controlled Proxies
- URL: http://arxiv.org/abs/2306.15083v3
- Date: Mon, 17 Jun 2024 19:21:28 GMT
- Title: Balanced Filtering via Disclosure-Controlled Proxies
- Authors: Siqi Deng, Emily Diana, Michael Kearns, Aaron Roth,
- Abstract summary: We study the problem of collecting a cohort that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time.
Our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone.
- Score: 8.477632486444353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner that can use a small set of labeled data to train a proxy function that can later be used for this filtering or selection task. We then associate the range of the proxy function with sampling probabilities; given a new example, we classify it using our proxy function and then select it with probability corresponding to its proxy classification. Importantly, we require that the proxy classification does not reveal significantly more information about the sensitive group membership of any individual example compared to population base rates alone (i.e., the level of disclosure should be controlled) and show that we can find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze its generalization properties.
Related papers
- Sample size planning for conditional counterfactual mean estimation with
a K-armed randomized experiment [0.0]
We show how to determine a sufficiently large sample size for a $K$-armed randomized experiment.
Using policy trees to learn sub-groups, we evaluate our nominal guarantees on a large publicly-available randomized experiment test data set.
arXiv Detail & Related papers (2024-03-06T20:37:29Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Addressing Missing Sources with Adversarial Support-Matching [8.53946780558779]
We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data.
Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"
We make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup.
arXiv Detail & Related papers (2022-03-24T16:19:19Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Ranking Models in Unlabeled New Environments [74.33770013525647]
We introduce the problem of ranking models in unlabeled new environments.
We use a proxy dataset that 1) is fully labeled and 2) well reflects the true model rankings in a given target environment.
Specifically, datasets that are more similar to the unlabeled target domain are found to better preserve the relative performance rankings.
arXiv Detail & Related papers (2021-08-23T17:57:15Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Predicting the Accuracy of a Few-Shot Classifier [3.609538870261841]
We first analyze the reasons for the variability of generalization performances.
We propose reasonable measures that we empirically demonstrate to be correlated with the generalization ability of considered classifiers.
arXiv Detail & Related papers (2020-07-08T16:31:28Z) - Group Membership Verification with Privacy: Sparse or Dense? [21.365032455883178]
Group membership verification checks if a biometric trait corresponds to one member of a group without revealing the identity of that member.
Recent contributions provide privacy for group membership protocols through the joint use of two mechanisms.
This paper proposes a mathematical model for group membership verification allowing to reveal the impact of sparsity on both security, compactness, and verification performances.
arXiv Detail & Related papers (2020-02-24T16:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.