Related papers: SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models

SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models

URL: http://arxiv.org/abs/2510.11131v1
Date: Mon, 13 Oct 2025 08:22:20 GMT
Title: SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models
Authors: Jia Wang, Ziyu Zhao, Tingjuntao Ni, Zhongyu Wei,
Abstract summary: Large language models (LLMs) show strong potential for simulating human social behaviors and interactions, yet lack large-scale, systematically constructed benchmarks for evaluating their alignment with real-world social attitudes.<n>We introduce SocioBench-a comprehensive benchmark derived from the annually collected, standardized survey data of the International Social Survey Programme (ISSP)<n>The benchmark aggregates over 480,000 real respondent records from more than 30 countries, spanning 10 sociological domains and over 40 demographic attributes.
Score: 32.66051406264919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) show strong potential for simulating human social behaviors and interactions, yet lack large-scale, systematically constructed benchmarks for evaluating their alignment with real-world social attitudes. To bridge this gap, we introduce SocioBench-a comprehensive benchmark derived from the annually collected, standardized survey data of the International Social Survey Programme (ISSP). The benchmark aggregates over 480,000 real respondent records from more than 30 countries, spanning 10 sociological domains and over 40 demographic attributes. Our experiments indicate that LLMs achieve only 30-40% accuracy when simulating individuals in complex survey scenarios, with statistically significant differences across domains and demographic subgroups. These findings highlight several limitations of current LLMs in survey scenarios, including insufficient individual-level data coverage, inadequate scenario diversity, and missing group-level modeling.

Related papers

LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations [63.478832978278014]
Large language models (LLMs) are increasingly deployed as autonomous agents, yet evaluations focus primarily on task success rather than cultural appropriateness or evaluator reliability.<n>We introduce LiveCultureBench, a multi-cultural, dynamic benchmark that embeds LLMs as agents in a simulated town and evaluates them on both task completion and adherence to socio-cultural norms.
arXiv Detail & Related papers (2026-03-02T15:04:16Z)
Beyond Static Snapshots: Dynamic Modeling and Forecasting of Group-Level Value Evolution with Large Language Models [24.234813956858577]
Social simulation is critical for mining complex social dynamics and supporting data-driven decision making.<n>Existing LLM-based approaches predominantly focus on group-level values at discrete time points.<n>We propose a novel framework for group-level dynamic social simulation by integrating historical value trajectories into LLM-based human response modeling.
arXiv Detail & Related papers (2026-02-15T08:14:55Z)
AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys [14.699937408707356]
We introduce AlignSurvey, the first benchmark that systematically replicates and evaluates the full social survey pipeline.<n>It defines four tasks aligned with key survey stages: social role modeling, semi-structured interview modeling, attitude stance modeling and survey response modeling.<n>It also provides task-specific evaluation metrics to assess alignment fidelity, consistency, and fairness at both individual and group levels.
arXiv Detail & Related papers (2025-11-11T06:14:21Z)
Synthetic social data: trials and tribulations [3.713365412512855]
We explore the statistical representation of social values across four countries for six Large Language Models.<n>We compare machine-generated outputs with actual human survey data.<n>Our findings suggest that, despite the logistical and financial constraints of human surveys, even a small, skewed sample of real respondents may provide more reliable insights.
arXiv Detail & Related papers (2025-10-22T18:25:42Z)
Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z)
Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation [18.225151370273093]
This paper explores a new paradigm: simulating virtual survey respondents using Large Language Models (LLMs)<n>We introduce two novel simulation settings, namely Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS)<n>We curate a comprehensive benchmark suite, LLM-S3 (Large Language Model-based Sociodemographic Simulation Survey), that spans 11 real-world public datasets across four sociological domains.
arXiv Detail & Related papers (2025-09-08T04:59:00Z)
Ireland in 2057: Projections using a Geographically Diverse Dynamic Microsimulation [4.230271396864462]
The model captures four primary events: births, deaths, internal migration, and international migration.<n>Each individual in the simulation is defined by five core attributes: age, sex, marital status, highest level of education attained, and economic status.
arXiv Detail & Related papers (2025-09-01T13:03:03Z)
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework [53.82097200295448]
Mean-Field LLM (MF-LLM) is first to incorporate mean field theory into social simulation.<n>MF-LLM models bidirectional interactions between individuals and the population through an iterative process.<n> IB-Tune is a novel fine-tuning method inspired by the Information Bottleneck principle.
arXiv Detail & Related papers (2025-04-30T12:41:51Z)
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users [70.02370111025617]
We introduce SocioVerse, an agent-driven world model for social simulation.<n>Our framework features four powerful alignment components and a user pool of 10 million real individuals.<n>Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness.
arXiv Detail & Related papers (2025-04-14T12:12:52Z)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z)
ChatGPT vs Social Surveys: Probing Objective and Subjective Silicon Population [7.281887764378982]
Large Language Models (LLMs) have the potential to simulate human responses in social surveys and generate reliable predictions.<n>We employ repeated random sampling to create sampling distributions that identify the population parameters of silicon samples generated by GPT.<n>Our findings show that GPT's demographic distribution aligns with the 2020 U.S. population in terms of gender and average age.<n> GPT's point estimates for attitudinal scores are highly inconsistent and show no clear inclination toward any particular ideology.
arXiv Detail & Related papers (2024-09-04T10:33:37Z)
Social Debiasing for Fair Multi-modal LLMs [59.61512883471714]
Multi-modal Large Language Models (MLLMs) have dramatically advanced the research field and delivered powerful vision-language understanding capabilities.<n>These models often inherit deep-rooted social biases from their training data, leading to uncomfortable responses with respect to attributes such as race and gender.<n>This paper addresses the issue of social biases in MLLMs by introducing a comprehensive counterfactual dataset with multiple social concepts.
arXiv Detail & Related papers (2024-08-13T02:08:32Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.