Related papers: From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

URL: http://arxiv.org/abs/2505.12381v1
Date: Sun, 18 May 2025 11:55:05 GMT
Title: From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling
Authors: Mohsinul Kabir, Tasfia Tahsin, Sophia Ananiadou,
Abstract summary: We study how data, model design choices, and temporal dynamics affect bias propagation during language modeling.<n>Our findings highlight the need for a holistic approach -- tracing bias to its origins across both data and model dimensions, not just symptoms, to mitigate harm.
Score: 17.673012459377375
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current research on bias in language models (LMs) predominantly focuses on data quality, with significantly less attention paid to model architecture and temporal influences of data. Even more critically, few studies systematically investigate the origins of bias. We propose a methodology grounded in comparative behavioral theory to interpret the complex interaction between training data and model architecture in bias propagation during language modeling. Building on recent work that relates transformers to n-gram LMs, we evaluate how data, model design choices, and temporal dynamics affect bias propagation. Our findings reveal that: (1) n-gram LMs are highly sensitive to context window size in bias propagation, while transformers demonstrate architectural robustness; (2) the temporal provenance of training data significantly affects bias; and (3) different model architectures respond differentially to controlled bias injection, with certain biases (e.g. sexual orientation) being disproportionately amplified. As language models become ubiquitous, our findings highlight the need for a holistic approach -- tracing bias to its origins across both data and model dimensions, not just symptoms, to mitigate harm.

Related papers

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [51.00909549291524]
Large language models (LLMs) exhibit cognitive biases.<n>These biases vary across models and can be amplified by instruction tuning.<n>It remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise.
arXiv Detail & Related papers (2025-07-09T18:01:14Z)
BiasConnect: Investigating Bias Interactions in Text-to-Image Models [73.76853483463836]
We introduce BiasConnect, a novel tool designed to analyze and quantify bias interactions in Text-to-Image models.<n>Our method provides empirical estimates that indicate how other bias dimensions shift toward or away from an ideal distribution when a given bias is modified.<n>We demonstrate the utility of BiasConnect for selecting optimal bias mitigation axes, comparing different TTI models on the dependencies they learn, and understanding the amplification of intersectional societal biases in TTI models.
arXiv Detail & Related papers (2025-03-12T19:01:41Z)
Biased Heritage: How Datasets Shape Models in Facial Expression Recognition [13.77824359359967]
We study bias propagation from datasets to trained models in image-based Facial Expression Recognition systems.<n>We introduce new bias metrics specifically designed for multiclass problems with multiple demographic groups.<n>Our findings suggest that preventing emotion-specific demographic patterns should be prioritized over general demographic balance in FER datasets.
arXiv Detail & Related papers (2025-03-05T12:25:22Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT [4.807994469764776]
We study the influence of model scale and pre-training data on a language model's learnt social biases. Our experiments show that pre-training data substantially influences how upstream biases evolve with model scale. We shed light on the complex interplay of data and model scale, and investigate how it translates to concrete biases.
arXiv Detail & Related papers (2024-07-25T23:09:33Z)
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training [7.5041863920639456]
Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy across different sub-populations.<n>This paper explores the evolution of bias in a teacher-student setup modeling different data sub-populations with a Gaussian-mixture model.<n>Applying our findings to fairness and robustness, we delineate how and when heterogeneous data and spurious features can generate and amplify bias.
arXiv Detail & Related papers (2024-05-28T15:50:10Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
Analyzing Bias in Diffusion-based Face Generation Models [75.80072686374564]
Diffusion models are increasingly popular in synthetic data generation and image editing applications. We investigate the presence of bias in diffusion-based face generation models with respect to attributes such as gender, race, and age. We examine how dataset size affects the attribute composition and perceptual quality of both diffusion and Generative Adversarial Network (GAN) based face generation models.
arXiv Detail & Related papers (2023-05-10T18:22:31Z)
The Birth of Bias: A case study on the evolution of gender bias in an English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. We find that the representation of gender is dynamic and identify different phases during training. We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.