The Global Representativeness Index: A Total Variation Distance Framework for Measuring Demographic Fidelity in Survey Research
- URL: http://arxiv.org/abs/2602.14835v2
- Date: Thu, 19 Feb 2026 23:43:17 GMT
- Title: The Global Representativeness Index: A Total Variation Distance Framework for Measuring Demographic Fidelity in Survey Research
- Authors: Evan Hadfield, Andrew Konya,
- Abstract summary: Survey research increasingly informs high-stakes decisions in AI governance and cross-cultural policy.<n>No standardized metric quantifies how well a sample's demographic composition matches its target population.<n>This paper introduces the Global Representativeness Index (GRI), a framework grounded in Total Variation Distance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Global survey research increasingly informs high-stakes decisions in AI governance and cross-cultural policy, yet no standardized metric quantifies how well a sample's demographic composition matches its target population. Response rates and demographic quotas -- the prevailing proxies for sample quality -- measure effort and coverage but not distributional fidelity. This paper introduces the Global Representativeness Index (GRI), a framework grounded in Total Variation Distance that scores any survey sample against population benchmarks across multiple demographic dimensions on a [0, 1] scale. Validation on seven waves of the Global Dialogues survey (N = 7,500 across 60+ countries) finds fine-grained demographic GRI scores of only 0.33--0.36 -- roughly 43% of the theoretical maximum at that sample size. Cross-validation on the World Values Survey (seven waves, N = 403,000), Afrobarometer Round 9 (N = 53,000), and Latinobarometro (N = 19,000) reveals that even large probability surveys score below 0.22 on fine-grained global demographics when country coverage is limited. The GRI connects to classical survey statistics through the design effect; both metrics are recommended as a minimum summary of sample quality, since GRI quantifies demographic distance symmetrically while effective N captures the asymmetric inferential cost of underrepresentation. The framework is released as an open-source Python library with UN and Pew Research Center population benchmarks, applicable to survey research, machine learning dataset auditing, and AI evaluation benchmarks.
Related papers
- Forecasting Antimicrobial Resistance Trends Using Machine Learning on WHO GLASS Surveillance Data: A Retrieval-Augmented Generation Approach for Policy Decision Support [0.0]
Antimicrobial resistance (AMR) is a growing global crisis projected to cause 10 million deaths per year by 2050.<n>This paper presents a two-component framework for AMR trend forecasting and evidence-grounded policy decision support.<n>XGBoost achieved the best performance with a test MAE of 7.07% and R-squared of 0.854, outperforming the naive baseline by 83.1%.
arXiv Detail & Related papers (2026-02-26T06:45:08Z) - SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models [32.66051406264919]
Large language models (LLMs) show strong potential for simulating human social behaviors and interactions, yet lack large-scale, systematically constructed benchmarks for evaluating their alignment with real-world social attitudes.<n>We introduce SocioBench-a comprehensive benchmark derived from the annually collected, standardized survey data of the International Social Survey Programme (ISSP)<n>The benchmark aggregates over 480,000 real respondent records from more than 30 countries, spanning 10 sociological domains and over 40 demographic attributes.
arXiv Detail & Related papers (2025-10-13T08:22:20Z) - EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models [1.141545154221656]
EvalMORAAL is a transparent chain-of-thought framework to evaluate moral alignment in 20 large language models.<n>We assess models on the World Values Survey (55 countries, 19 topics) and the PEW Global Attitudes Survey (39 countries, 8 topics)
arXiv Detail & Related papers (2025-10-07T13:52:16Z) - WorldPM: Scaling Human Preference Modeling [130.23230492612214]
We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential.<n>We collect preference data from public forums covering diverse user communities.<n>We conduct extensive training using 15M-scale data across models ranging from 1.5B to 72B parameters.
arXiv Detail & Related papers (2025-05-15T17:38:37Z) - Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - ChatGPT vs Social Surveys: Probing Objective and Subjective Silicon Population [7.281887764378982]
Large Language Models (LLMs) have the potential to simulate human responses in social surveys and generate reliable predictions.<n>We employ repeated random sampling to create sampling distributions that identify the population parameters of silicon samples generated by GPT.<n>Our findings show that GPT's demographic distribution aligns with the 2020 U.S. population in terms of gender and average age.<n> GPT's point estimates for attitudinal scores are highly inconsistent and show no clear inclination toward any particular ideology.
arXiv Detail & Related papers (2024-09-04T10:33:37Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.