Weak Proxies are Sufficient and Preferable for Fairness with Missing
Sensitive Attributes
- URL: http://arxiv.org/abs/2210.03175v2
- Date: Tue, 31 Jan 2023 02:13:12 GMT
- Title: Weak Proxies are Sufficient and Preferable for Fairness with Missing
Sensitive Attributes
- Authors: Zhaowei Zhu, Yuanshun Yao, Jiankai Sun, Hang Li, Yang Liu
- Abstract summary: We develop an algorithm that is able to measure fairness (provably) accurately with only three properly identified proxies.
Our results imply a set of practical guidelines for practitioners on how to use proxies properly.
- Score: 25.730297492625507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluating fairness can be challenging in practice because the sensitive
attributes of data are often inaccessible due to privacy constraints. The go-to
approach that the industry frequently adopts is using off-the-shelf proxy
models to predict the missing sensitive attributes, e.g. Meta [Alao et al.,
2021] and Twitter [Belli et al., 2022]. Despite its popularity, there are three
important questions unanswered: (1) Is directly using proxies efficacious in
measuring fairness? (2) If not, is it possible to accurately evaluate fairness
using proxies only? (3) Given the ethical controversy over inferring user
private information, is it possible to only use weak (i.e. inaccurate) proxies
in order to protect privacy? Our theoretical analyses show that directly using
proxy models can give a false sense of (un)fairness. Second, we develop an
algorithm that is able to measure fairness (provably) accurately with only
three properly identified proxies. Third, we show that our algorithm allows the
use of only weak proxies (e.g. with only 68.85%accuracy on COMPAS), adding an
extra layer of protection on user privacy. Experiments validate our theoretical
analyses and show our algorithm can effectively measure and mitigate bias. Our
results imply a set of practical guidelines for practitioners on how to use
proxies properly. Code is available at github.com/UCSC-REAL/fair-eval.
Related papers
- Proxy Discrimination After Students for Fair Admissions [0.0]
Article develops a test for regulating the use of variables that proxy for race and other protected classes and classifications.
It suggests that lawmakers can develop caps to permissible proxy power over time, as courts and algorithm builders learn more about the power of variables.
arXiv Detail & Related papers (2025-01-07T17:13:24Z) - Towards Harmless Rawlsian Fairness Regardless of Demographic Prior [57.30787578956235]
We explore the potential for achieving fairness without compromising its utility when no prior demographics are provided to the training set.
We propose a simple but effective method named VFair to minimize the variance of training losses inside the optimal set of empirical losses.
arXiv Detail & Related papers (2024-11-04T12:40:34Z) - Preserving Fairness Generalization in Deepfake Detection [14.485069525871504]
Deepfake detection models can result in unfair performance disparities among demographic groups, such as race and gender.
We propose the first method to address the fairness generalization problem in deepfake detection by simultaneously considering features, loss, and optimization aspects.
Our method employs disentanglement learning to extract demographic and domain-agnostic features, fusing them to encourage fair learning across a flattened loss landscape.
arXiv Detail & Related papers (2024-02-27T05:47:33Z) - Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Better Fair than Sorry: Adversarial Missing Data Imputation for Fair GNNs [5.655251163654288]
We propose Better Fair than Sorry, a fair missing data imputation model for protected attributes.
The key design principle behind BFtS is that imputations should approximate the worst-case scenario for fairness.
Experiments using synthetic and real datasets show that BFtS often achieves a better fairness x accuracy trade-off than existing alternatives.
arXiv Detail & Related papers (2023-11-02T20:57:44Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - Longitudinal Fairness with Censorship [1.5688552250473473]
We devise applicable fairness measures, propose a debiasing algorithm, and provide necessary theoretical constructs to bridge fairness with and without censorship.
Our experiments on four censored datasets confirm the utility of our approach.
arXiv Detail & Related papers (2022-03-30T03:08:40Z) - Non-isotropy Regularization for Proxy-based Deep Metric Learning [78.18860829585182]
We propose non-isotropy regularization ($mathbbNIR$) for proxy-based Deep Metric Learning.
This allows us to explicitly induce a non-isotropic distribution of samples around a proxy to optimize for.
Experiments highlight consistent generalization benefits of $mathbbNIR$ while achieving competitive and state-of-the-art performance.
arXiv Detail & Related papers (2022-03-16T11:13:20Z) - On Assessing the Usefulness of Proxy Domains for Developing and
Evaluating Embodied Agents [0.0]
We argue that the value of a proxy is conditioned on the task that it is being used to help solve.
We establish new proxy usefulness (PU) metrics to compare the usefulness of different proxy domains.
arXiv Detail & Related papers (2021-09-29T16:04:39Z) - Multiaccurate Proxies for Downstream Fairness [20.36220509798361]
We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time.
We adopt a fairness pipeline perspective, in which an "upstream" learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes.
We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose.
arXiv Detail & Related papers (2021-07-09T13:16:44Z) - Is Private Learning Possible with Instance Encoding? [68.84324434746765]
We study whether a non-private learning algorithm can be made private by relying on an instance-encoding mechanism.
We formalize both the notion of instance encoding and its privacy by providing two attack models.
arXiv Detail & Related papers (2020-11-10T18:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.