Related papers: Leveraging In-Context Learning for Political Bias Testing of LLMs

Leveraging In-Context Learning for Political Bias Testing of LLMs

URL: http://arxiv.org/abs/2506.22232v1
Date: Fri, 27 Jun 2025 13:49:37 GMT
Title: Leveraging In-Context Learning for Political Bias Testing of LLMs
Authors: Patrick Haller, Jannis Vamvas, Rico Sennrich, Lena A. Jäger,
Abstract summary: We propose a new probing task, Questionnaire Modeling (QM), that uses human survey data as in-context examples.<n>We show that QM improves the stability of question-based bias evaluation, and demonstrate that it may be used to compare instruction-tuned models to their base versions.
Score: 44.269860094943354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A growing body of work has been querying LLMs with political questions to evaluate their potential biases. However, this probing method has limited stability, making comparisons between models unreliable. In this paper, we argue that LLMs need more context. We propose a new probing task, Questionnaire Modeling (QM), that uses human survey data as in-context examples. We show that QM improves the stability of question-based bias evaluation, and demonstrate that it may be used to compare instruction-tuned models to their base versions. Experiments with LLMs of various sizes indicate that instruction tuning can indeed change the direction of bias. Furthermore, we observe a trend that larger models are able to leverage in-context examples more effectively, and generally exhibit smaller bias scores in QM. Data and code are publicly available.

Related papers

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Relative Bias: A Comparative Framework for Quantifying Bias in LLMs [29.112649816695203]
Relative Bias is a method designed to assess how an LLM's behavior deviates from other LLMs within a specified target domain.<n>We introduce two complementary methodologies: (1) Embedding Transformation analysis, which captures relative bias patterns through sentence representations over the embedding space, and (2) LLM-as-a-Judge, which employs a language model to evaluate outputs comparatively.<n>Applying our framework to several case studies on bias and alignment scenarios following by statistical tests for validation, we find strong alignment between the two scoring methods.
arXiv Detail & Related papers (2025-05-22T01:59:54Z)
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs [1.89915151018241]
We argue that implicit bias in Large Language Models (LLMs) is not only an ethical, but also a technical issue.<n>We developed a method for calculating an easily interpretable benchmark, DIF (Demographic Implicit Fairness)
arXiv Detail & Related papers (2025-05-15T06:53:37Z)
Systematic Bias in Large Language Models: Discrepant Response Patterns in Binary vs. Continuous Judgment Tasks [13.704342633541454]
Large Language Models (LLMs) are increasingly used in tasks such as psychological text analysis and decision-making in automated systems.<n>This study examines how different response format: binary versus continuous, may systematically influence LLMs' judgments.
arXiv Detail & Related papers (2025-04-28T03:20:55Z)
No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language Models [0.9620910657090186]
Large Language Models (LLMs) have increased the performance of different natural language understanding as well as generation tasks.<n>We provide a unified evaluation of benchmarks using a set of representative small and medium-sized LLMs.<n>We propose five prompting approaches to carry out the bias detection task across different aspects of bias.<n>The results indicate that each of the selected LLMs suffer from one or the other form of bias with the Phi-3.5B model being the least biased.
arXiv Detail & Related papers (2025-03-15T03:58:14Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.<n>In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment [82.99849359892112]
We re-examine previously reported reductions in response diversity post-alignment.<n>Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation.<n>Findings indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior.
arXiv Detail & Related papers (2024-06-25T16:32:33Z)
OLMES: A Standard for Language Model Evaluations [64.85905119836818]
OLMES is a documented, practical, open standard for reproducible language model evaluations.<n>It supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.<n> OLMES includes well-considered, documented recommendations guided by results from existing literature as well as new experiments resolving open questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z)
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models. We show that models give substantively different answers when not forced. We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z)
Stick to your Role! Stability of Personal Values Expressed in Large Language Models [19.516125296160638]
We present a case-study on the stability of value expression over different contexts. Reusing methods from psychology, we study Rank-order stability on the population. We observe consistent trends in the stability of models and model families.
arXiv Detail & Related papers (2024-02-19T14:53:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.