The Need for a Socially-Grounded Persona Framework for User Simulation
- URL: http://arxiv.org/abs/2601.07110v1
- Date: Mon, 12 Jan 2026 00:27:18 GMT
- Title: The Need for a Socially-Grounded Persona Framework for User Simulation
- Authors: Pranav Narayanan Venkit, Yu Li, Yada Pruksachatkun, Chien-Sheng Wu,
- Abstract summary: We introduce SCOPE, a framework for persona construction and evaluation.<n>We find that demographic-only personas are a structural bottleneck.<n>Adding sociopsychological facets improves behavioral prediction and reduces over-accentuation.<n>Our results indicate that persona quality depends on sociopsychological structure rather than demographic templates or summaries.
- Score: 32.09483697866529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic personas are widely used to condition large language models (LLMs) for social simulation, yet most personas are still constructed from coarse sociodemographic attributes or summaries. We revisit persona creation by introducing SCOPE, a socially grounded framework for persona construction and evaluation, built from a 141-item, two-hour sociopsychological protocol collected from 124 U.S.-based participants. Across seven models, we find that demographic-only personas are a structural bottleneck: demographics explain only ~1.5% of variance in human response similarity. Adding sociopsychological facets improves behavioral prediction and reduces over-accentuation, and non-demographic personas based on values and identity achieve strong alignment with substantially lower bias. These trends generalize to SimBench (441 aligned questions), where SCOPE personas outperform default prompting and NVIDIA Nemotron personas, and SCOPE augmentation improves Nemotron-based personas. Our results indicate that persona quality depends on sociopsychological structure rather than demographic templates or summaries.
Related papers
- Persona Prompting as a Lens on LLM Social Reasoning [5.001433675691563]
For socially sensitive tasks like hate speech detection, the quality of explanations from Large Language Models (LLMs) is crucial.<n>While Persona prompting (PP) is increasingly used as a way to steer model towards user-specific generation, its effect on model rationales remains underexplored.
arXiv Detail & Related papers (2026-01-28T16:41:17Z) - HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z) - HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns [59.17423586203706]
We present HUMANLLM, a framework treating psychological patterns as interacting causal forces.<n>We construct 244 patterns from 12,000 academic papers and synthesize 11,359 scenarios where 2-5 patterns reinforce, conflict, or modulate each other.<n>Our dual-level checklists evaluate both individual pattern fidelity and emergent multi-pattern dynamics, achieving strong human alignment.
arXiv Detail & Related papers (2026-01-15T08:56:53Z) - DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas [13.83414782465312]
We introduce DEEPPERSONA, a scalable generative engine for synthesizing narrative-complete synthetic personas.<n>First, we algorithmically construct the largest-ever human-attribute taxonomy, comprising over hundreds of hierarchically organized attributes.<n>We conditionally generate coherent and realistic personas that average hundreds of structured attributes and roughly 1 MB of narrative text.
arXiv Detail & Related papers (2025-11-10T17:37:56Z) - TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation [55.55404595177229]
Large Language Models (LLMs) are exhibiting emergent human-like abilities.<n>TwinVoice is a benchmark for assessing persona simulation across diverse real-world contexts.
arXiv Detail & Related papers (2025-10-29T14:00:42Z) - Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z) - Generative Agent Simulations of 1,000 People [56.82159813294894]
We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals.
The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers.
Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions.
arXiv Detail & Related papers (2024-11-15T11:14:34Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.