Related papers: HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns

HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns

URL: http://arxiv.org/abs/2601.10198v1
Date: Thu, 15 Jan 2026 08:56:53 GMT
Title: HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns
Authors: Xintao Wang, Jian Yang, Weiyuan Li, Rui Xie, Jen-tse Huang, Jun Gao, Shuai Huang, Yueping Kang, Liyuan Gou, Hongwei Feng, Yanghua Xiao,
Abstract summary: We present HUMANLLM, a framework treating psychological patterns as interacting causal forces.<n>We construct 244 patterns from 12,000 academic papers and synthesize 11,359 scenarios where 2-5 patterns reinforce, conflict, or modulate each other.<n>Our dual-level checklists evaluate both individual pattern fidelity and emergent multi-pattern dynamics, achieving strong human alignment.
Score: 59.17423586203706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning and generation, serving as the foundation for advanced persona simulation and Role-Playing Language Agents (RPLAs). However, achieving authentic alignment with human cognitive and behavioral patterns remains a critical challenge for these agents. We present HUMANLLM, a framework treating psychological patterns as interacting causal forces. We construct 244 patterns from ~12,000 academic papers and synthesize 11,359 scenarios where 2-5 patterns reinforce, conflict, or modulate each other, with multi-turn conversations expressing inner thoughts, actions, and dialogue. Our dual-level checklists evaluate both individual pattern fidelity and emergent multi-pattern dynamics, achieving strong human alignment (r=0.91) while revealing that holistic metrics conflate simulation accuracy with social desirability. HUMANLLM-8B outperforms Qwen3-32B on multi-pattern dynamics despite 4x fewer parameters, demonstrating that authentic anthropomorphism requires cognitive modeling--simulating not just what humans do, but the psychological processes generating those behaviors.

Related papers

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests [5.119837168333715]
This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities.<n>Evaluators demonstrated an excellent ability to understand and analyze TAT responses.
arXiv Detail & Related papers (2026-02-19T06:08:33Z)
HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z)
CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models [8.120889327955032]
We introduce CogToM, a comprehensive, theoretically grounded benchmark comprising over 8000 bilingual instances across 46 paradigms.<n>A systematic evaluation of 22 representative models, including frontier models like GPT-5.1 and Qwen3-Max, reveals significant performance heterogeneities and highlights persistent bottlenecks in specific dimensions.<n>CogToM offers a robust instrument and perspective for investigating the evolving cognitive boundaries of Large Language Models.
arXiv Detail & Related papers (2026-01-22T03:59:19Z)
Cognitive Foundations for Reasoning and Their Manifestation in LLMs [63.12951576410617]
Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning.<n>We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning reasoning invariants, meta-cognitive controls, representations for organizing reasoning & knowledge, and transformation operations.<n>We develop test-time reasoning guidance that automatically scaffold successful structures, improving performance by up to 66.7% on complex problems.
arXiv Detail & Related papers (2025-11-20T18:59:00Z)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation [55.55404595177229]
Large Language Models (LLMs) are exhibiting emergent human-like abilities.<n>TwinVoice is a benchmark for assessing persona simulation across diverse real-world contexts.
arXiv Detail & Related papers (2025-10-29T14:00:42Z)
Modeling Layered Consciousness with Multi-Agent Large Language Models [9.566692471247995]
We propose a framework for modeling artificial consciousness in large language models (LLMs)<n>Our textbfPsychodynamic Model simulates self-awareness, preconsciousness, and unconsciousness through agent interaction.
arXiv Detail & Related papers (2025-10-10T07:08:34Z)
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models [20.077019480409657]
The tendency of users to anthropomorphise large language models (LLMs) is of growing interest to AI developers, researchers, and policy-makers.<n>Here, we present a novel method for empirically evaluating anthropomorphic LLM behaviours in realistic and varied settings.<n>First, we develop a multi-turn evaluation of 14 anthropomorphic behaviours.<n>Second, we present a scalable, automated approach by employing simulations of user interactions.<n>Third, we conduct an interactive, large-scale human subject study (N=1101) to validate that the model behaviours we measure predict real users' anthropomorphic perceptions.
arXiv Detail & Related papers (2025-02-10T22:09:57Z)
Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models [2.9312156642007294]
We systematically review Large Language Models' capabilities across three important cognitive domains: decision-making biases, reasoning, and creativity.<n>On decision-making, our synthesis reveals that while LLMs demonstrate several human-like biases, some biases observed in humans are absent.<n>On reasoning, advanced LLMs like GPT-4 exhibit deliberative reasoning akin to human System-2 thinking, while smaller models fall short of human-level performance.<n>A distinct dichotomy emerges in creativity: while LLMs excel in language-based creative tasks, such as storytelling, they struggle with divergent thinking tasks that require real-world context.
arXiv Detail & Related papers (2024-12-20T02:26:56Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z)
User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.