Related papers: Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs

Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs

URL: http://arxiv.org/abs/2601.15755v1
Date: Thu, 22 Jan 2026 08:45:55 GMT
Title: Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs
Authors: Tristan Williams, Franziska Weeber, Sebastian Padó, Alan Akbik,
Abstract summary: We propose a framework for evaluating the representativeness of aligned models.<n>We show the value of our evaluation scheme by comparing two model steering techniques.<n>We conclude that representativeness is a distinct aspect of value alignment.
Score: 13.630995219491972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are increasingly used to represent human opinions, values, or beliefs, and their steerability towards these ideals is an active area of research. Existing work focuses predominantly on aligning marginal response distributions, treating each survey item independently. While essential, this may overlook deeper latent structures that characterise real populations and underpin cultural values theories. We propose a framework for evaluating the representativeness of aligned models through multivariate correlation patterns in addition to marginal distributions. We show the value of our evaluation scheme by comparing two model steering techniques (persona prompting and demographic fine-tuning) and evaluating them against human responses from the World Values Survey. While the demographically fine-tuned model better approximates marginal response distributions than persona prompting, both techniques fail to fully capture the gold standard correlation patterns. We conclude that representativeness is a distinct aspect of value alignment and an evaluation focused on marginals can mask structural failures, leading to overly optimistic conclusions about model capabilities.

Related papers

Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning [6.102462703832761]
We propose OG-MAR, an Ontology-Guided Multi-Agent Reasoning framework.<n> OG-MAR summarizes respondent-specific values from the World Values Survey (WVS).<n>It constructs a global cultural ontology by eliciting relations over a fixed taxonomy via competency questions.<n>At inference time, it retrieves demographically similar profiles to instantiate multiple value-persona agents.
arXiv Detail & Related papers (2026-01-29T13:31:45Z)
RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems [85.16327248973387]
We develop RoleRM, a reward model trained with Continuous Implicit Preferences (CIP)<n>We show RoleRM surpasses strong open- and closed-source reward models by over 24% on average.<n>Our findings highlight the importance of continuous preference representation and annotation consistency, establishing a foundation for subjective alignment in human-centered dialogue systems.
arXiv Detail & Related papers (2025-12-11T12:04:46Z)
Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models [93.1043186636177]
We explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental models tailored to novel situations.<n>We propose a computational implementation of this idea -- a Model Synthesis Architecture''<n>We evaluate our MSA as a model of human judgments on a novel reasoning dataset.
arXiv Detail & Related papers (2025-07-16T18:01:03Z)
Fair Deepfake Detectors Can Generalize [51.21167546843708]
We show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions.<n>Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals.<n>DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art
arXiv Detail & Related papers (2025-07-03T14:10:02Z)
Preference Learning for AI Alignment: a Causal Perspective [55.2480439325792]
We frame this problem in a causal paradigm, providing the rich toolbox of causality to identify persistent challenges.<n>Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation.<n>We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness.
arXiv Detail & Related papers (2025-06-06T10:45:42Z)
INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models [8.340794604348632]
Multi-modal AI systems have potential for industrial applications by emulating human-like cognition.<n>They also pose significant ethical challenges, including amplifying harmful content and reinforcing societal biases.<n>This paper presents INFELM, an in-depth fairness evaluation on widely-used text-to-image models.
arXiv Detail & Related papers (2024-12-28T02:28:19Z)
Comparing Fairness of Generative Mobility Models [3.699135947901772]
This work examines the fairness of generative mobility models, addressing the often overlooked dimension of equity in model performance across geographic regions. Predictive models built on crowd flow data are instrumental in understanding urban structures and movement patterns. We propose a novel framework for assessing fairness by measuring utility and equity of generated traces.
arXiv Detail & Related papers (2024-11-07T06:01:12Z)
Using LLMs to Model the Beliefs and Preferences of Targeted Populations [4.0849074543032105]
We consider the problem of aligning a large language model (LLM) to model the preferences of a human population. Modeling the beliefs, preferences, and behaviors of a specific population can be useful for a variety of different applications.
arXiv Detail & Related papers (2024-03-29T15:58:46Z)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models. We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench. GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z)
How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.