Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
- URL: http://arxiv.org/abs/2406.05902v1
- Date: Sun, 9 Jun 2024 19:42:25 GMT
- Title: Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
- Authors: Emilia Agis Lerner, Florian E. Dorner, Elliott Ash, Naman Goel,
- Abstract summary: We find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators.
We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation.
- Score: 8.04095222893591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing body of work on learning from human feedback to align various aspects of machine learning systems with human values and preferences. We consider the setting of fairness in content moderation, in which human feedback is used to determine how two comments -- referencing different sensitive attribute groups -- should be treated in comparison to one another. With a novel dataset collected from Prolific and MTurk, we find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators. We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation. Further, we find that differences also exist in downstream classifiers trained to predict human preferences. Finally, we observe that an ensemble, giving equal weight to classifiers trained on annotations from different demographics, performs better for different demographic intersections; compared to a single classifier that gives equal weight to each annotation.
Related papers
- FineFACE: Fair Facial Attribute Classification Leveraging Fine-grained Features [3.9440964696313485]
Research highlights the presence of demographic bias in automated facial attribute classification algorithms.
Existing bias mitigation techniques typically require demographic annotations and often obtain a trade-off between fairness and accuracy.
This paper proposes a novel approach to fair facial attribute classification by framing it as a fine-grained classification problem.
arXiv Detail & Related papers (2024-08-29T20:08:22Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Human-Guided Fair Classification for Natural Language Processing [9.652938946631735]
We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to generate semantically similar sentences that differ along sensitive attributes.
We validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification.
arXiv Detail & Related papers (2022-12-20T10:46:40Z) - Deep Learning on a Healthy Data Diet: Finding Important Examples for
Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes.
Data augmentation reduces gender bias by adding counterfactual examples to the training dataset.
We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z) - Towards Intersectionality in Machine Learning: Including More
Identities, Handling Underrepresentation, and Performing Evaluation [23.661509482014058]
We grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes.
We advocate for supplementing domain knowledge with empirical validation when choosing which demographic attribute labels to train on.
We warn against using data imbalance techniques without considering their normative implications.
arXiv Detail & Related papers (2022-05-10T01:00:52Z) - On Disentangled and Locally Fair Representations [95.6635227371479]
We study the problem of performing classification in a manner that is fair for sensitive groups, such as race and gender.
We learn a locally fair representation, such that, under the learned representation, the neighborhood of each sample is balanced in terms of the sensitive attribute.
arXiv Detail & Related papers (2022-05-05T14:26:50Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Towards classification parity across cohorts [16.21248370949611]
This research work aims to achieve classification parity across explicit as well as implicit sensitive features.
We obtain implicit cohorts by clustering embeddings of each individual trained on the language generated by them using a language model.
We improve classification parity by introducing modification to the loss function aimed to minimize the range of model performances across cohorts.
arXiv Detail & Related papers (2020-05-16T16:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.