Related papers: Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

URL: http://arxiv.org/abs/2509.13869v1
Date: Wed, 17 Sep 2025 09:58:28 GMT
Title: Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs
Authors: Yang Liu, Chenhui Chu,
Abstract summary: Large language models (LLMs) can lead to undesired consequences when misaligned with human values.<n>Previous studies have revealed the misalignment of LLMs with human values using expert-designed or agent-based emulated bias scenarios.<n>In this study, we investigate the alignment of LLMs with human values regarding social biases (HVSB) in different types of bias scenarios.
Score: 24.53996114318076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) can lead to undesired consequences when misaligned with human values, especially in scenarios involving complex and sensitive social biases. Previous studies have revealed the misalignment of LLMs with human values using expert-designed or agent-based emulated bias scenarios. However, it remains unclear whether the alignment of LLMs with human values differs across different types of scenarios (e.g., scenarios containing negative vs. non-negative questions). In this study, we investigate the alignment of LLMs with human values regarding social biases (HVSB) in different types of bias scenarios. Through extensive analysis of 12 LLMs from four model families and four datasets, we demonstrate that LLMs with large model parameter scales do not necessarily have lower misalignment rate and attack success rate. Moreover, LLMs show a certain degree of alignment preference for specific types of scenarios and the LLMs from the same model family tend to have higher judgment consistency. In addition, we study the understanding capacity of LLMs with their explanations of HVSB. We find no significant differences in the understanding of HVSB across LLMs. We also find LLMs prefer their own generated explanations. Additionally, we endow smaller language models (LMs) with the ability to explain HVSB. The generation results show that the explanations generated by the fine-tuned smaller LMs are more readable, but have a relatively lower model agreeability.

Related papers

Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation [89.52571224447111]
Large language model (LLM) judges have often been used alongside traditional, algorithm-based metrics for tasks like summarization.<n>We provide an LLM judge bias analysis as a function of overlap with human-written responses in the domain of summarization.
arXiv Detail & Related papers (2026-02-07T19:39:28Z)
Are LLMs Biased Like Humans? Causal Reasoning as a Function of Prior Knowledge, Irrelevant Information, and Reasoning Budget [0.9558392439655014]
Large language models (LLMs) are increasingly used in domains where causal reasoning matters.<n>We benchmark 20+ LLMs against a matched human baseline on 11 causal judgment tasks formalized by a collider structure.<n>We find that most LLMs exhibit more rule-like reasoning strategies than humans who seem to account for unmentioned latent factors in their probability judgments.
arXiv Detail & Related papers (2026-02-03T01:43:09Z)
Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation [66.84286617519258]
Large language models (LLMs) are rapidly transforming social science research by enabling the automation of labor-intensive tasks.<n>LLMs outputs vary significantly depending on the implementation choices made by researchers.<n>Such variation can introduce systematic biases and random errors, which propagate to downstream analyses and cause Type I, Type II, Type S, or Type M errors.
arXiv Detail & Related papers (2025-09-10T17:58:53Z)
Applying Large Language Models to Travel Satisfaction Analysis [2.5105418815378555]
This study uses household survey data collected in Shanghai to identify the existence and source of misalignment between Large Language Models (LLMs) and humans.<n>LLMs have strongcapabilities in contextual understanding and generalization, significantly reducing dependence on task-specific data.<n>We propose an LLM-based modeling approach that can be applied to model travel behavior with small sample sizes.
arXiv Detail & Related papers (2025-05-29T09:11:58Z)
Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks [52.098988739649705]
This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater.<n>We develop a no-consensus'' benchmark by curating examples that encompass a variety of a priori ambivalent scenarios.<n>Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters.
arXiv Detail & Related papers (2025-05-28T01:31:54Z)
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs [1.89915151018241]
We argue that implicit bias in Large Language Models (LLMs) is not only an ethical, but also a technical issue.<n>We developed a method for calculating an easily interpretable benchmark, DIF (Demographic Implicit Fairness)
arXiv Detail & Related papers (2025-05-15T06:53:37Z)
A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification [0.0]
This study introduces the straightforward ensemble strategy to a sentiment analysis using large language models (LLMs)<n>As the results, we demonstrate that the ensemble of multiple inference using medium-sized LLMs produces more robust and accurate results than using a large model with a single attempt with reducing RMSE by 18.6%.
arXiv Detail & Related papers (2025-04-26T10:10:26Z)
Evaluating how LLM annotations represent diverse views on contentious topics [3.405231040967506]
We show that generative large language models (LLMs) tend to be biased in the same directions on the same demographic categories within the same datasets.<n>We conclude with a discussion of the implications for researchers and practitioners using LLMs for automated data annotation tasks.
arXiv Detail & Related papers (2025-03-29T22:53:15Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.<n>In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment [82.99849359892112]
We re-examine previously reported reductions in response diversity post-alignment.<n>Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation.<n>Findings indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior.
arXiv Detail & Related papers (2024-06-25T16:32:33Z)
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive [53.08398658452411]
Large Language Models (LLMs) are increasingly utilized in autonomous decision-making.<n>We show that this sampling behavior resembles that of human decision-making.<n>We show that this deviation of a sample from the statistical norm towards a prescriptive component consistently appears in concepts across diverse real-world domains.
arXiv Detail & Related papers (2024-02-16T18:28:43Z)
On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.