Related papers: Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History

Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History

URL: http://arxiv.org/abs/2505.21362v1
Date: Tue, 27 May 2025 15:52:39 GMT
Title: Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History
Authors: Qishuai Zhong, Zongmin Li, Siqi Fan, Aixin Sun,
Abstract summary: We propose a framework to evaluate large language models' adaptation when attributes are introduced explicitly via user profiles in the prompt or implicitly through multi-turn dialogue history.<n>Our findings indicate that most models adjust their expressed values in response to demographic changes, particularly in age and education level, but consistency varies.<n>Models with stronger reasoning capabilities demonstrate greater alignment, indicating the importance of reasoning in robust sociodemographic adaptation.
Score: 33.47267548932745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective engagement by large language models (LLMs) requires adapting responses to users' sociodemographic characteristics, such as age, occupation, and education level. While many real-world applications leverage dialogue history for contextualization, existing evaluations of LLMs' behavioral adaptation often focus on single-turn prompts. In this paper, we propose a framework to evaluate LLM adaptation when attributes are introduced either (1) explicitly via user profiles in the prompt or (2) implicitly through multi-turn dialogue history. We assess the consistency of model behavior across these modalities. Using a multi-agent pipeline, we construct a synthetic dataset pairing dialogue histories with distinct user profiles and employ questions from the Value Survey Module (VSM 2013) (Hofstede and Hofstede, 2016) to probe value expression. Our findings indicate that most models adjust their expressed values in response to demographic changes, particularly in age and education level, but consistency varies. Models with stronger reasoning capabilities demonstrate greater alignment, indicating the importance of reasoning in robust sociodemographic adaptation.

Related papers

Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? [81.49470136653665]
We evaluate the robustness and expressiveness of value representations across three widely used probing strategies.<n>We show that the demographic context has little effect on the free-text generation, and the models' values only weakly correlate with their preference for value-based actions.
arXiv Detail & Related papers (2025-07-17T18:56:41Z)
A Dual-Layered Evaluation of Geopolitical and Cultural Bias in LLMs [0.6494933736121663]
Large language models (LLMs) are increasingly deployed across diverse linguistic and cultural contexts.<n>This paper defines two types of bias in LLMs: model bias (bias stemming from model training) and inference bias (bias induced by the language of the query)<n>We construct a manually curated dataset spanning both factual and disputable QA, across four languages and question types.
arXiv Detail & Related papers (2025-06-27T03:37:15Z)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z)
Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework [9.162876771766513]
We introduce AUGMENT, a framework for generating controlled, realistic prompt paraphrases based on linguistic structure and user demographics.<n>AUGMENT ensures paraphrase quality through a combination of semantic, stylistic, and instruction-following criteria.<n>Our findings highlight the need for more representative and structured approaches to prompt variation in large language models.
arXiv Detail & Related papers (2025-05-06T14:17:30Z)
Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions [33.76973308687867]
We show that models do improve in sociodemographic prompting when trained.<n>This performance gain is largely due to models learning annotator-specific behaviour rather than sociodemographic patterns.<n>Across all tasks, our results suggest that models learn little meaningful connection between sociodemographics and annotation.
arXiv Detail & Related papers (2025-02-28T09:53:42Z)
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles [37.43150003866563]
We introduce the User Simulator with Implicit Profiles (USP), a framework that infers implicit user profiles from human-machine interactions to simulate personalized and realistic dialogues.<n>USP outperforms strong baselines in terms of authenticity and diversity while maintaining comparable consistency.
arXiv Detail & Related papers (2025-02-26T09:26:54Z)
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue [1.8652965834931452]
We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. We extensively analyze different LLM adaptation techniques when applied to different dialogue types.
arXiv Detail & Related papers (2024-06-10T15:52:49Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z)
Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model. This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models. Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z)
Dialogue History Matters! Personalized Response Selectionin Multi-turn Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching. Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information. We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z)
Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.