Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
- URL: http://arxiv.org/abs/2511.18123v1
- Date: Sat, 22 Nov 2025 17:04:30 GMT
- Title: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
- Authors: Dachuan Zhao, Weiyue Li, Zhenda Shen, Yushu Qiu, Bowen Xu, Haoyu Chen, Yongchao Chen,
- Abstract summary: Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases.<n>We propose a geometrically principled framework that identifies and removes the entire subspace of linearly decodable bias.<n>Our method achieves more robust debiasing with an average improvement of $18.5%$ across four fairness metrics.
- Score: 13.46898179372249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and misaligned predictions in downstream tasks. Such behavior undermines fairness and distorts the intended alignment between vision and language. Recent post-hoc approaches attempt to mitigate bias by replacing the most attribute-correlated embedding coordinates with neutral values. However, our systematic analysis reveals three critical failures of this coordinate-wise approach: feature entanglement, poor cross-dataset generalization, and incomplete bias removal. We find that bias is not localized to a few coordinates but is instead distributed across a few linear subspaces. To address these limitations, we propose $\textbf{S}$ubspace $\textbf{P}$rojection $\textbf{D}$ebiasing ($\textbf{SPD}$), a geometrically principled framework that identifies and removes the entire subspace of linearly decodable bias while reinserting a neutral mean component to preserve semantic fidelity. Extensive experiments across zero-shot classification, text-to-image retrieval, and image generation validate the effectiveness of SPD: our method achieves more robust debiasing with an average improvement of $18.5\%$ across four fairness metrics, while maintaining minimal loss in task performance compared to the best debiasing baseline.
Related papers
- BLADE: Bias-Linked Adaptive DEbiasing [2.7352017408152083]
BLADE is a generative debiasing framework that requires no prior knowledge of bias or bias-conflicting samples.<n>We evaluate BLADE on multiple benchmark datasets and show that it significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-10-05T12:28:54Z) - Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models [73.20190633746442]
We introduce BiasConnect, a novel tool for analyzing and quantifying bias interactions in text-to-image models.<n>We propose InterMit, an intersectional bias mitigation algorithm guided by user-defined target distributions and priority weights.
arXiv Detail & Related papers (2025-05-22T20:56:38Z) - Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models [15.53216696218776]
We explore the internal mechanisms of how bias emerges in large language models (LLMs) when provided with ambiguous comparative prompts.
We propose $textttATLAS$, a technique to localize bias to specific layers of the LLM by analyzing attention scores and then reduce bias by scaling attention in these biased layers.
arXiv Detail & Related papers (2024-10-29T20:15:56Z) - With a Grain of SALT: Are LLMs Fair Across Social Dimensions? [3.5001789247699535]
This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs) across gender, religion, and race.<n>We use the SALT dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation.<n>Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment.
arXiv Detail & Related papers (2024-10-16T12:22:47Z) - Towards Real-world Debiasing: Rethinking Evaluation, Challenge, and Solution [17.080528126651977]
We introduce two novel real-world-inspired biases to bridge this gap and build a systematic evaluation framework for real-world debiasing.<n>We propose a simple yet effective approach named Debias in Destruction (DiD) to address the challenge.
arXiv Detail & Related papers (2024-05-24T06:06:41Z) - Marginal Debiased Network for Fair Visual Recognition [59.05212866862219]
We propose a novel marginal debiased network (MDN) to learn debiased representations.
Our MDN can achieve a remarkable performance on under-represented samples.
arXiv Detail & Related papers (2024-01-04T08:57:09Z) - Revisiting Evaluation Metrics for Semantic Segmentation: Optimization
and Evaluation of Fine-grained Intersection over Union [113.20223082664681]
We propose the use of fine-grained mIoUs along with corresponding worst-case metrics.
These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing.
Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
arXiv Detail & Related papers (2023-10-30T03:45:15Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - BiasEnsemble: Revisiting the Importance of Amplifying Bias for Debiasing [31.665352191081357]
"Debiasing" aims to train a classifier to be less susceptible to dataset bias.
$f_B$ is trained to focus on bias-aligned samples while $f_D$ is mainly trained with bias-conflicting samples.
We propose a novel biased sample selection method BiasEnsemble which removes the bias-conflicting samples.
arXiv Detail & Related papers (2022-05-29T07:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.