Related papers: HumanLLM: Towards Personalized Understanding and Simulation of Human Nature

HumanLLM: Towards Personalized Understanding and Simulation of Human Nature

URL: http://arxiv.org/abs/2601.15793v1
Date: Thu, 22 Jan 2026 09:27:27 GMT
Title: HumanLLM: Towards Personalized Understanding and Simulation of Human Nature
Authors: Yuxuan Lei, Tianfu Wang, Jianxun Lian, Zhengyu Hu, Defu Lian, Xing Xie,
Abstract summary: HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
Score: 72.55730315685837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Motivated by the remarkable progress of large language models (LLMs) in objective tasks like mathematics and coding, there is growing interest in their potential to simulate human behavior--a capability with profound implications for transforming social science research and customer-centric business insights. However, LLMs often lack a nuanced understanding of human cognition and behavior, limiting their effectiveness in social simulation and personalized applications. We posit that this limitation stems from a fundamental misalignment: standard LLM pretraining on vast, uncontextualized web data does not capture the continuous, situated context of an individual's decisions, thoughts, and behaviors over time. To bridge this gap, we introduce HumanLLM, a foundation model designed for personalized understanding and simulation of individuals. We first construct the Cognitive Genome Dataset, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon. Through a rigorous, multi-stage pipeline involving data filtering, synthesis, and quality control, we automatically extract over 5.5 million user logs to distill rich profiles, behaviors, and thinking patterns. We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences. Comprehensive evaluations demonstrate that HumanLLM achieves superior performance in predicting user actions and inner thoughts, more accurately mimics user writing styles and preferences, and generates more authentic user profiles compared to base models. Furthermore, HumanLLM shows significant gains on out-of-domain social intelligence benchmarks, indicating enhanced generalization.

Related papers

Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z)
OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation [56.47029531207105]
OPERA is the first public dataset that comprehensively captures user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales.<n>We establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale.
arXiv Detail & Related papers (2025-06-05T21:37:49Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM)<n>We show that current LLMs exhibit a systemic lack of trust in humans.<n>We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data [28.900987544062257]
We introduce BIG5-CHAT, a large-scale dataset containing 100,000 dialogues designed to ground models in how humans express their personality in language.<n>Our methods prompting outperform on personality assessments such as BFI and IPIP-NEO, with trait correlations more closely matching human data.<n>Our experiments reveal that models trained to exhibit higher conscientiousness, higher agreeableness, lower extraversion, and lower neuroticism display better performance on reasoning tasks.
arXiv Detail & Related papers (2024-10-21T20:32:27Z)
Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations. We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z)
Personality Alignment of Large Language Models [30.710131188931317]
Personality Alignment aims to align large language models with individual user preferences.<n>This dataset includes data from over 320,000 real subjects across multiple personality assessments.<n>We develop an activation intervention optimization method to efficiently align with individual behavioral preferences.<n>Our work paves the way for future AI systems to make decisions and reason in truly personality ways.
arXiv Detail & Related papers (2024-08-21T17:09:00Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Human Simulacra: Benchmarking the Personification of Large Language Models [38.21708264569801]
Large language models (LLMs) are recognized as systems that closely mimic aspects of human intelligence.<n>This paper introduces a framework for constructing virtual characters' life stories from the ground up.<n> Experimental results demonstrate that our constructed simulacra can produce personified responses that align with their target characters.
arXiv Detail & Related papers (2024-02-28T09:11:14Z)
User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.