Related papers: From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

URL: http://arxiv.org/abs/2510.17247v1
Date: Mon, 20 Oct 2025 07:37:43 GMT
Title: From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models
Authors: Zefan Cai, Haoyi Qiu, Haozhe Zhao, Ke Wan, Jiachen Li, Jiuxiang Gu, Wen Xiao, Nanyun Peng, Junjie Hu,
Abstract summary: We introduce VideoBiasEval, a framework for evaluating social representation in video generation.<n>VideoBiasEval employs an event-based prompting strategy to disentangle semantic content from actor attributes.<n>We conduct the first end-to-end analysis connecting biases in human preference datasets, their amplification in reward models, and their propagation through alignment-tuned video diffusion models.
Score: 69.4332879415364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a comprehensive diagnostic framework for evaluating social representation in video generation. Grounded in established social bias taxonomies, VideoBiasEval employs an event-based prompting strategy to disentangle semantic content (actions and contexts) from actor attributes (gender and ethnicity). It further introduces multi-granular metrics to evaluate (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts in social attributes across model variants, and (4) the temporal persistence of bias within videos. Using this framework, we conduct the first end-to-end analysis connecting biases in human preference datasets, their amplification in reward models, and their propagation through alignment-tuned video diffusion models. Our results reveal that alignment tuning not only strengthens representational biases but also makes them temporally stable, producing smoother yet more stereotyped portrayals. These findings highlight the need for bias-aware evaluation and mitigation throughout the alignment process to ensure fair and socially responsible video generation.

Related papers

FAIRT2V: Training-Free Debiasing for Text-to-Video Diffusion Models [20.319952004585616]
We present FairT2V, a training-free debiasing framework for text-to-video generation.<n>We first analyze demographic bias in T2V models and show that it primarily originates from pretrained text encoders.<n>We quantify this effect with a gender-leaning score that correlates with bias in generated videos.
arXiv Detail & Related papers (2026-01-28T17:29:53Z)
A Comprehensive Study of Implicit and Explicit Biases in Large Language Models [1.0555164678638427]
This study highlights the need to address biases in Large Language Models amid growing generative AI.<n>We studied bias-specific benchmarks such as StereoSet and CrowSPairs to evaluate the existence of various biases in multiple generative models such as BERT and GPT 3.5.<n>Results indicated fine-tuned models struggle with gender biases but excelled at identifying and avoiding racial biases.
arXiv Detail & Related papers (2025-11-18T05:27:17Z)
Exploring Bias in over 100 Text-to-Image Generative Models [49.60774626839712]
We investigate bias trends in text-to-image generative models over time, focusing on the increasing availability of models through open platforms like Hugging Face.<n>We assess bias across three key dimensions: (i) distribution bias, (ii) generative hallucination, and (iii) generative miss-rate.<n>Our findings indicate that artistic and style-transferred models exhibit significant bias, whereas foundation models, benefiting from broader training distributions, are becoming progressively less biased.
arXiv Detail & Related papers (2025-03-11T03:40:44Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
We introduce VLBiasBench, a benchmark to evaluate biases in Large Vision-Language Models (LVLMs)<n>VLBiasBench features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race x gender and race x social economic status.<n>We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
Finetuning Text-to-Image Diffusion Models for Fairness [43.80733100304361]
We frame fairness as a distributional alignment problem. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Our method supports diverse perspectives of fairness beyond absolute equality.
arXiv Detail & Related papers (2023-11-11T05:40:54Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
Detection and Mitigation of Bias in Ted Talk Ratings [3.3598755777055374]
Implicit bias is a behavioral conditioning that leads us to attribute predetermined characteristics to members of certain groups. This paper quantifies implicit bias in viewer ratings of TEDTalks, a diverse social platform assessing social and professional performance.
arXiv Detail & Related papers (2020-03-02T06:13:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.