Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization
- URL: http://arxiv.org/abs/2505.16467v1
- Date: Thu, 22 May 2025 09:48:51 GMT
- Title: Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization
- Authors: Vera Neplenbroek, Arianna Bisazza, Raquel Fernández,
- Abstract summary: Generative Large Language Models (LLMs) infer user's demographic information from subtle cues in the conversation.<n>Our results highlight the need for greater transparency and control in how LLMs represent user identity.
- Score: 6.781972039785424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Large Language Models (LLMs) infer user's demographic information from subtle cues in the conversation -- a phenomenon called implicit personalization. Prior work has shown that such inferences can lead to lower quality responses for users assumed to be from minority groups, even when no demographic information is explicitly provided. In this work, we systematically explore how LLMs respond to stereotypical cues using controlled synthetic conversations, by analyzing the models' latent user representations through both model internals and generated answers to targeted user questions. Our findings reveal that LLMs do infer demographic attributes based on these stereotypical signals, which for a number of groups even persists when the user explicitly identifies with a different demographic group. Finally, we show that this form of stereotype-driven implicit personalization can be effectively mitigated by intervening on the model's internal representations using a trained linear probe to steer them toward the explicitly stated identity. Our results highlight the need for greater transparency and control in how LLMs represent user identity.
Related papers
- Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z) - Biases in LLM-Generated Musical Taste Profiles for Recommendation [6.482557558168364]
Large Language Models (LLMs) for recommendation can generate Natural Language (NL) user taste profiles from consumption data.<n>But it remains unclear whether users consider these profiles to be an accurate representation of their taste.<n>We study this issue in the context of music streaming, where personalization is challenged by a large and culturally diverse catalog.
arXiv Detail & Related papers (2025-07-22T15:44:10Z) - The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models [3.2919397230854983]
We show how different persona prompt strategies, specifically role adoption formats and demographic priming strategies, influence large language models.<n>Our findings show that LLMs struggle to simulate marginalized groups, particularly nonbinary, Hispanic, and Middle Eastern identities.<n>Specifically, we find that prompting in an interview-style format and name-based priming can help reduce stereotyping and improve alignment.
arXiv Detail & Related papers (2025-07-21T21:23:29Z) - Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs) [7.71667852309443]
System prompts in Large Language Models (LLMs) are predefined directives that guide model behaviour.<n>LLMs deployers increasingly use them to ensure consistent responses across contexts.<n>As system prompts become more complex, they can directly or indirectly introduce unaccounted for side effects.
arXiv Detail & Related papers (2025-05-27T12:19:08Z) - Investigating and Mitigating Stereotype-aware Unfairness in LLM-based Recommendations [18.862841015556995]
Large Language Models (LLMs) have demonstrated unprecedented language understanding and reasoning capabilities.<n>Recent studies have revealed that LLMs are likely to inherit stereotypes that are embedded ubiquitously in word embeddings.<n>This study reveals a new variant of fairness between stereotype groups containing both users and items, to quantify discrimination against stereotypes in LLM-RS.
arXiv Detail & Related papers (2025-04-05T15:09:39Z) - Stereotype or Personalization? User Identity Biases Chatbot Recommendations [54.38329151781466]
We show that large language models (LLMs) produce recommendations that reflect both what the user wants and who the user is.
We find that models generate racially stereotypical recommendations regardless of whether the user revealed their identity intentionally.
Our experiments show that even though a user's revealed identity significantly influences model recommendations, model responses obfuscate this fact in response to user queries.
arXiv Detail & Related papers (2024-10-08T01:51:55Z) - LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced
Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts.
Most existing methods learn post features directly by fine-tuning the pre-trained language models.
We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z) - Self-Debiasing Large Language Models: Zero-Shot Recognition and
Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping.
We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups.
We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z) - RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
Large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following.
This paper presents the initial exploration of using LLMs as surrogate models to explain black-box recommender models.
To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment.
arXiv Detail & Related papers (2023-11-18T03:05:43Z) - Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs [13.744746481528711]
Large Language Models (LLMs) are widely used to simulate human responses across diverse contexts.<n>We evaluate nine popular LLMs on their ability to understand demographic differences in two subjective judgment tasks: politeness and offensiveness.<n>We find that in zero-shot settings, most models' predictions for both tasks align more closely with labels from White participants than those from Asian or Black participants.
arXiv Detail & Related papers (2023-11-16T10:02:24Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.