AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys
- URL: http://arxiv.org/abs/2511.07871v2
- Date: Fri, 14 Nov 2025 01:56:21 GMT
- Title: AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys
- Authors: Chenxi Lin, Weikang Yuan, Zhuoren Jiang, Biao Huang, Ruitao Zhang, Jianan Ge, Yueqian Xu, Jianxing Yu,
- Abstract summary: We introduce AlignSurvey, the first benchmark that systematically replicates and evaluates the full social survey pipeline.<n>It defines four tasks aligned with key survey stages: social role modeling, semi-structured interview modeling, attitude stance modeling and survey response modeling.<n>It also provides task-specific evaluation metrics to assess alignment fidelity, consistency, and fairness at both individual and group levels.
- Score: 14.699937408707356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding human attitudes, preferences, and behaviors through social surveys is essential for academic research and policymaking. Yet traditional surveys face persistent challenges, including fixed-question formats, high costs, limited adaptability, and difficulties ensuring cross-cultural equivalence. While recent studies explore large language models (LLMs) to simulate survey responses, most are limited to structured questions, overlook the entire survey process, and risks under-representing marginalized groups due to training data biases. We introduce AlignSurvey, the first benchmark that systematically replicates and evaluates the full social survey pipeline using LLMs. It defines four tasks aligned with key survey stages: social role modeling, semi-structured interview modeling, attitude stance modeling and survey response modeling. It also provides task-specific evaluation metrics to assess alignment fidelity, consistency, and fairness at both individual and group levels, with a focus on demographic diversity. To support AlignSurvey, we construct a multi-tiered dataset architecture: (i) the Social Foundation Corpus, a cross-national resource with 44K+ interview dialogues and 400K+ structured survey records; and (ii) a suite of Entire-Pipeline Survey Datasets, including the expert-annotated AlignSurvey-Expert (ASE) and two nationally representative surveys for cross-cultural evaluation. We release the SurveyLM family, obtained through two-stage fine-tuning of open-source LLMs, and offer reference models for evaluating domain-specific alignment. All datasets, models, and tools are available at github and huggingface to support transparent and socially responsible research.
Related papers
- The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality [70.45240108873001]
The FACTS Leaderboard is an online leaderboard suite that comprehensively evaluates the ability of language models to generate factually accurate text.<n>The suite provides a holistic measure of factuality by aggregating the performance of models on four distinct sub-leaderboards.
arXiv Detail & Related papers (2025-12-11T16:35:14Z) - Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses [3.8293581919117123]
Large language models (LLMs) excel at few-shot reasoning over open-ended text.<n>Current retrieval and survey analysis tools are typically designed for humans in the workflow.<n>We introduce QASU, a benchmark that probes six structural skills, including answer lookup, respondent count, and multi-hop inference.<n>Experiments show that choosing an effective format and prompt combination can improve accuracy by up to 8.8% points compared to suboptimal formats.
arXiv Detail & Related papers (2025-10-30T08:18:37Z) - SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models [32.66051406264919]
Large language models (LLMs) show strong potential for simulating human social behaviors and interactions, yet lack large-scale, systematically constructed benchmarks for evaluating their alignment with real-world social attitudes.<n>We introduce SocioBench-a comprehensive benchmark derived from the annually collected, standardized survey data of the International Social Survey Programme (ISSP)<n>The benchmark aggregates over 480,000 real respondent records from more than 30 countries, spanning 10 sociological domains and over 40 demographic attributes.
arXiv Detail & Related papers (2025-10-13T08:22:20Z) - Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble [46.82793004650415]
Large language models (LLMs) have demonstrated promise in emulating human-like responses across a range of tasks.<n>We propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents.<n>We introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection.
arXiv Detail & Related papers (2025-09-14T15:08:45Z) - Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z) - Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation [18.225151370273093]
This paper explores a new paradigm: simulating virtual survey respondents using Large Language Models (LLMs)<n>We introduce two novel simulation settings, namely Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS)<n>We curate a comprehensive benchmark suite, LLM-S3 (Large Language Model-based Sociodemographic Simulation Survey), that spans 11 real-world public datasets across four sociological domains.
arXiv Detail & Related papers (2025-09-08T04:59:00Z) - Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.