How Quantization Shapes Bias in Large Language Models
- URL: http://arxiv.org/abs/2508.18088v1
- Date: Mon, 25 Aug 2025 14:48:26 GMT
- Title: How Quantization Shapes Bias in Large Language Models
- Authors: Federico Marcuzzi, Xuefei Ning, Roy Schwartz, Iryna Gurevych,
- Abstract summary: We focus on weight and activation quantization strategies and examine their effects across a broad range of bias types.<n>We employ both probabilistic and generated text-based metrics across nine benchmarks and evaluate models varying in architecture family and reasoning ability.
- Score: 61.40435736418359
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents a comprehensive evaluation of how quantization affects model bias, with particular attention to its impact on individual demographic subgroups. We focus on weight and activation quantization strategies and examine their effects across a broad range of bias types, including stereotypes, toxicity, sentiment, and fairness. We employ both probabilistic and generated text-based metrics across nine benchmarks and evaluate models varying in architecture family and reasoning ability. Our findings show that quantization has a nuanced impact on bias: while it can reduce model toxicity and does not significantly impact sentiment, it tends to slightly increase stereotypes and unfairness in generative tasks, especially under aggressive compression. These trends are generally consistent across demographic categories and model types, although their magnitude depends on the specific setting. Overall, our results highlight the importance of carefully balancing efficiency and ethical considerations when applying quantization in practice.
Related papers
- Race, Ethnicity and Their Implication on Bias in Large Language Models [9.202525724606188]
We study how race and ethnicity are represented and operationalized within large language models (LLMs)<n>We find that demographic information is distributed across internal units with substantial cross-model variation.<n> Interventions suppressing such neurons reduce bias but leave substantial residual effects.
arXiv Detail & Related papers (2026-01-19T09:24:24Z) - Class-Dependent Perturbation Effects in Evaluating Time Series Attributions [5.136283512042341]
We show previously overlooked class-dependent effects in feature attribution metrics.<n>Our analysis suggests that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality.<n>We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects.
arXiv Detail & Related papers (2025-02-24T10:22:03Z) - Biased or Flawed? Mitigating Stereotypes in Generative Language Models by Addressing Task-Specific Flaws [12.559028963968247]
generative language models often reflect and amplify societal biases in their outputs.<n>We propose a targeted stereotype mitigation framework that implicitly mitigates observed stereotypes in generative models.<n>We reduce stereotypical outputs by over 60% across multiple dimensions.
arXiv Detail & Related papers (2024-12-16T03:29:08Z) - Leveraging Large Language Models and Topic Modeling for Toxicity Classification [2.1506858566021037]
We investigate the impact of annotator positionality on the dataset while using topic-modeling strategies for content moderation.<n>Results indicate that fine-tuning the models on specific topics results in a notable improvement in the F1 score of the models.
arXiv Detail & Related papers (2024-11-26T20:47:24Z) - Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context.
We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available.
These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - De-biasing "bias" measurement [20.049916973204102]
We show that metrics used to measure group-wise model performance disparities are themselves statistically biased estimators of the underlying quantities they purport to represent.
We propose the "double-corrected" variance estimator, which provides unbiased estimates and uncertainty quantification of the variance of model performance across groups.
arXiv Detail & Related papers (2022-05-11T20:51:57Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.