Persona Generators: Generating Diverse Synthetic Personas at Scale
- URL: http://arxiv.org/abs/2602.03545v1
- Date: Tue, 03 Feb 2026 13:59:03 GMT
- Title: Persona Generators: Generating Diverse Synthetic Personas at Scale
- Authors: Davide Paglieri, Logan Cross, William A. Cunningham, Joel Z. Leibo, Alexander Sasha Vezhnevets,
- Abstract summary: evaluating AI systems that interact with humans requires understanding their behavior across diverse user populations.<n>Recent work in Generative Agent-Based Modeling has shown that large language models can simulate human-like synthetic personas with high fidelity.<n>We introduce Persona Generators, functions that can produce diverse synthetic populations tailored to arbitrary contexts.
- Score: 43.73350076375402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evaluating AI systems that interact with humans requires understanding their behavior across diverse user populations, but collecting representative human data is often expensive or infeasible, particularly for novel technologies or hypothetical future scenarios. Recent work in Generative Agent-Based Modeling has shown that large language models can simulate human-like synthetic personas with high fidelity, accurately reproducing the beliefs and behaviors of specific individuals. However, most approaches require detailed data about target populations and often prioritize density matching (replicating what is most probable) rather than support coverage (spanning what is possible), leaving long-tail behaviors underexplored. We introduce Persona Generators, functions that can produce diverse synthetic populations tailored to arbitrary contexts. We apply an iterative improvement loop based on AlphaEvolve, using large language models as mutation operators to refine our Persona Generator code over hundreds of iterations. The optimization process produces lightweight Persona Generators that can automatically expand small descriptions into populations of diverse synthetic personas that maximize coverage of opinions and preferences along relevant diversity axes. We demonstrate that evolved generators substantially outperform existing baselines across six diversity metrics on held-out contexts, producing populations that span rare trait combinations difficult to achieve in standard LLM outputs.
Related papers
- Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z) - CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations [53.37721117405022]
Cross-embodiment dexterous grasp synthesis refers to adaptively generating and optimizing grasps for various robotic hands.<n>We propose CEDex, a novel cross-embodiment dexterous grasp synthesis method at scale.<n>We construct the largest cross-embodiment grasp dataset to date, comprising 500K objects across four types with 20M total grasps.
arXiv Detail & Related papers (2025-09-29T12:08:04Z) - Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z) - Large language model as user daily behavior data generator: balancing population diversity and individual personality [12.464365435176099]
We introduce BehaviorGen, a framework that uses large language models to generate high-quality synthetic behavior data.<n>By simulating user behavior based on profiles and real events, BehaviorGen supports data augmentation and replacement in behavior prediction models.<n>We evaluate its performance in scenarios such as augmentation, fine-tuning replacement, and fine-tuning augmentation, achieving significant improvements in human mobility and smartphone usage predictions.
arXiv Detail & Related papers (2025-05-23T08:22:09Z) - Gen-C: Populating Virtual Worlds with Generative Crowds [2.1716667622896195]
We introduce Generative Crowds (Gen-C), a generative framework that produces crowd scenarios capturing agent-agent and agent-environment interactions.<n>Gen-C employs a dual Variational Graph Autoencoder (VGAE) architecture that jointly learns connectivity patterns and node features conditioned on textual and structural signals.<n>We demonstrate the effectiveness of Gen-C on scenarios with diverse behaviors such as a University Campus and a Train Station.
arXiv Detail & Related papers (2025-04-02T17:33:53Z) - A multi-objective combinatorial optimisation framework for large scale hierarchical population synthesis [1.2233362977312945]
In agent-based simulations, synthetic populations of agents are commonly used to represent the structure, behaviour, and interactions of individuals.
We propose a multi objective optimisation technique for large scale population synthesis.
Our approach supports complex hierarchical structures between individuals and households, is scalable to large populations and achieves minimal contigency table reconstruction error.
arXiv Detail & Related papers (2024-07-03T15:01:12Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - What Comes Next? Evaluating Uncertainty in Neural Text Generators
Against Human Production Variability [28.403105682913374]
We characterise the extent to which human production varies lexically, syntactically, and semantically across four Natural Language Generation (NLG) tasks.
We then inspect the space of output strings shaped by a generation system's predicted probability distribution and decoding algorithm to probe its uncertainty.
We analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples provides the level of detail necessary to gain understanding of a model's representation of uncertainty.
arXiv Detail & Related papers (2023-05-19T14:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.