BiasDora: Exploring Hidden Biased Associations in Vision-Language Models
- URL: http://arxiv.org/abs/2407.02066v2
- Date: Thu, 03 Oct 2024 23:51:14 GMT
- Title: BiasDora: Exploring Hidden Biased Associations in Vision-Language Models
- Authors: Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu,
- Abstract summary: We investigate hidden, implicit associations across 9 bias dimensions.
We show how biased associations vary in their negativity, toxicity, and extremity.
Our work identifies subtle and extreme biases that are typically not recognized by existing methodologies.
- Score: 23.329280888159744
- License:
- Abstract: Existing works examining Vision-Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the Dataset of retrieved associations, (Dora), publicly available here https://github.com/chahatraj/BiasDora.
Related papers
- The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs)
We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status)
We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z) - Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models [10.73340009530019]
This study addresses two such biases within Large Language Models (LLMs): representative bias and affinity bias.
We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS)
Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men.
Our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to bias fingerprints'
arXiv Detail & Related papers (2024-05-23T13:35:34Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content.
This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z) - Probing Intersectional Biases in Vision-Language Models with
Counterfactual Examples [5.870913541790421]
We employ text-to-image diffusion models to produce counterfactual examples for probing intserctional social biases at scale.
Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs.
We conduct extensive experiments using our generated dataset which reveal the intersectional social biases present in state-of-the-art VLMs.
arXiv Detail & Related papers (2023-10-04T17:25:10Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense
Reasoning Models [22.13138599547492]
We propose SODAPOP (SOcial bias Discovery from Answers about PeOPle) in social commonsense question-answering.
By using a social commonsense model to score the generated distractors, we are able to uncover the model's stereotypic associations between demographic groups and an open set of words.
We also test SODAPOP on debiased models and show the limitations of multiple state-of-the-art debiasing algorithms.
arXiv Detail & Related papers (2022-10-13T18:04:48Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.