Subverting Fair Image Search with Generative Adversarial Perturbations
- URL: http://arxiv.org/abs/2205.02414v2
- Date: Fri, 6 May 2022 19:54:48 GMT
- Title: Subverting Fair Image Search with Generative Adversarial Perturbations
- Authors: Avijit Ghosh, Matthew Jagielski, Christo Wilson
- Abstract summary: We present a case study in which we attack a state-of-the-art, fairness-aware image search engine.
These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation.
We demonstrate that our attacks are robust across a number of variables, that they have close to zero impact on the relevance of search results, and that they succeed under a strict threat model.
- Score: 14.669429931620689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we explore the intersection fairness and robustness in the
context of ranking: when a ranking model has been calibrated to achieve some
definition of fairness, is it possible for an external adversary to make the
ranking model behave unfairly without having access to the model or training
data? To investigate this question, we present a case study in which we develop
and then attack a state-of-the-art, fairness-aware image search engine using
images that have been maliciously modified using a Generative Adversarial
Perturbation (GAP) model. These perturbations attempt to cause the fair
re-ranking algorithm to unfairly boost the rank of images containing people
from an adversary-selected subpopulation.
We present results from extensive experiments demonstrating that our attacks
can successfully confer significant unfair advantage to people from the
majority class relative to fairly-ranked baseline search results. We
demonstrate that our attacks are robust across a number of variables, that they
have close to zero impact on the relevance of search results, and that they
succeed under a strict threat model. Our findings highlight the danger of
deploying fair machine learning algorithms in-the-wild when (1) the data
necessary to achieve fairness may be adversarially manipulated, and (2) the
models themselves are not robust against attacks.
Related papers
- Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness [15.059419033330126]
We present a novel strategy, called Fair Diffusion, to attenuate biases after the deployment of generative text-to-image models.
Specifically, we demonstrate shifting a bias, based on human instructions, in any direction yielding arbitrarily new proportions for, e.g., identity groups.
This introduced control enables instructing generative image models on fairness, with no data filtering and additional training required.
arXiv Detail & Related papers (2023-02-07T18:25:28Z) - Order-Disorder: Imitation Adversarial Attacks for Black-box Neural
Ranking Models [48.93128542994217]
We propose an imitation adversarial attack on black-box neural passage ranking models.
We show that the target passage ranking model can be transparentized and imitated by enumerating critical queries/candidates.
We also propose an innovative gradient-based attack method, empowered by the pairwise objective function, to generate adversarial triggers.
arXiv Detail & Related papers (2022-09-14T09:10:07Z) - A Tale of HodgeRank and Spectral Method: Target Attack Against Rank
Aggregation Is the Fixed Point of Adversarial Game [153.74942025516853]
The intrinsic vulnerability of the rank aggregation methods is not well studied in the literature.
In this paper, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data.
The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments.
arXiv Detail & Related papers (2022-09-13T05:59:02Z) - Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications.
In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence.
Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - Evaluating Adversarial Attacks on ImageNet: A Reality Check on
Misclassification Classes [3.0128052969792605]
We investigate the nature of the classes into which adversarial examples are misclassified in ImageNet.
We find that $71%$ of the adversarial examples that achieve model-to-model adversarial transferability are misclassified into one of the top-5 classes.
We also find that a large subset of untargeted misclassifications are, in fact, misclassifications into semantically similar classes.
arXiv Detail & Related papers (2021-11-22T08:54:34Z) - Ethical Adversaries: Towards Mitigating Unfairness with Adversarial
Machine Learning [8.436127109155008]
Individuals, as well as organisations, notice, test, and criticize unfair results to hold model designers and deployers accountable.
We offer a framework that assists these groups in mitigating unfair representations stemming from the training datasets.
Our framework relies on two inter-operating adversaries to improve fairness.
arXiv Detail & Related papers (2020-05-14T10:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.