Related papers: LLM Generated Persona is a Promise with a Catch

LLM Generated Persona is a Promise with a Catch

URL: http://arxiv.org/abs/2503.16527v1
Date: Tue, 18 Mar 2025 03:11:27 GMT
Title: LLM Generated Persona is a Promise with a Catch
Authors: Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng,
Abstract summary: Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback.<n>Traditional methods to collect realistic persona data face challenges.<n>They are prohibitively expensive and logistically challenging due to privacy constraints.
Score: 18.45442859688198
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of large language models (LLMs) to simulate human behavior has gained significant attention, particularly through personas that approximate individual characteristics. Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback, including social science, economic analysis, marketing research, and business operations. Traditional methods to collect realistic persona data face significant challenges. They are prohibitively expensive and logistically challenging due to privacy constraints, and often fail to capture multi-dimensional attributes, particularly subjective qualities. Consequently, synthetic persona generation with LLMs offers a scalable, cost-effective alternative. However, current approaches rely on ad hoc and heuristic generation techniques that do not guarantee methodological rigor or simulation precision, resulting in systematic biases in downstream tasks. Through extensive large-scale experiments including presidential election forecasts and general opinion surveys of the U.S. population, we reveal that these biases can lead to significant deviations from real-world outcomes. Our findings underscore the need to develop a rigorous science of persona generation and outline the methodological innovations, organizational and institutional support, and empirical foundations required to enhance the reliability and scalability of LLM-driven persona simulations. To support further research and development in this area, we have open-sourced approximately one million generated personas, available for public access and analysis at https://huggingface.co/datasets/Tianyi-Lab/Personas.

Related papers

Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions [11.751234495886674]
LLM-based digital twin simulation holds great promise for research in AI, social science, and digital experimentation.<n>We survey a representative sample of $N = 2,058$ participants (average 2.42 hours per person) in the US across four waves with 500 questions in total.<n>Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels.
arXiv Detail & Related papers (2025-05-23T05:05:11Z)
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users [70.02370111025617]
We introduce SocioVerse, an agent-driven world model for social simulation. Our framework features four powerful alignment components and a user pool of 10 million real individuals. Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness.
arXiv Detail & Related papers (2025-04-14T12:12:52Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI) This paper explores potential areas where statisticians can make important contributions to the development of LLMs. We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions. As a testbed, we use country-level results from two global cultural surveys. We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z)
Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs? [1.7819574476785418]
This study explores the potential of Large Language Models (LLMs) to generate artificial surveys. By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods. A novel approach incorporating "Personas" is introduced and compared to five other synthetic survey methods.
arXiv Detail & Related papers (2025-01-20T15:11:03Z)
Large Language Models for Market Research: A Data-augmentation Approach [3.3199591445531453]
Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks.<n>Recent studies highlight a significant gap between LLM-generated and human data, with biases introduced when substituting between the two.<n>We propose a novel statistical data augmentation approach that efficiently integrates LLM-generated data with real data in conjoint analysis.
arXiv Detail & Related papers (2024-12-26T22:06:29Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim. Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts. To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z)
Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations. We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions.<n>Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases.<n>These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z)
Large Language Models as Subpopulation Representative Models: A Review [5.439020425819001]
Large language models (LLMs) could be used to estimate subpopulation representative models (SRMs) LLMs could provide an alternate or complementary way to measure public opinion among demographic, geographic, or political segments of the population.
arXiv Detail & Related papers (2023-10-27T04:31:27Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.