What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models
- URL: http://arxiv.org/abs/2509.03827v1
- Date: Thu, 04 Sep 2025 02:28:58 GMT
- Title: What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models
- Authors: Pierre Le Coz, Jia An Liu, Debarun Bhattacharjya, Georgina Curto, Serge Stinckwich,
- Abstract summary: This article evaluates whether large language models (LLMs) are aligned with domain experts to inform social policymaking on the subject of homelessness alleviation.<n>We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies.<n>We present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios.
- Score: 13.022045946656661
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) are increasingly being adopted in high-stakes domains. Their capacity to process vast amounts of unstructured data, explore flexible scenarios, and handle a diversity of contextual factors can make them uniquely suited to provide new insights for the complexity of social policymaking. This article evaluates whether LLMs' are aligned with domain experts (and among themselves) to inform social policymaking on the subject of homelessness alleviation - a challenge affecting over 150 million people worldwide. We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies (South Bend, USA; Barcelona, Spain; Johannesburg, South Africa; Macau SAR, China). The policies in scope are grounded in the conceptual framework of the Capability Approach for human development. We also present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios. The paper results reveal promising potential to leverage LLMs for social policy making. If responsible guardrails and contextual calibrations are introduced in collaboration with local domain experts, LLMs can provide humans with valuable insights, in the form of alternative policies at scale.
Related papers
- Towards Harnessing the Power of LLMs for ABAC Policy Mining [0.0468771281852187]
This paper presents an empirical investigation into the capabilities of Large Language Models (LLMs) to perform automated Attribute-based Access Control (ABAC) policy mining.<n>We evaluate the performance of some of the state-of-the-art LLMs, specifically Google Gemini (Flash and Pro) and OpenAI ChatGPT, as potential policy mining engines.
arXiv Detail & Related papers (2025-11-22T15:49:36Z) - Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare [87.06241096619112]
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare.<n>We introduce the Social Welfare Function Benchmark, a dynamic simulation environment where an LLM acts as a sovereign allocator.<n>We evaluate 20 state-of-the-art LLMs and present the first leaderboard for social welfare allocation.
arXiv Detail & Related papers (2025-10-01T17:52:31Z) - Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z) - Can Large Language Models Become Policy Refinement Partners? Evidence from China's Social Security Studies [10.816677663320547]
This study investigates the capability boundaries and performance characteristics of large language models (LLMs) in generating policy recommendations for China's social security issues.<n>LLMs face significant limitations in addressing complex social dynamics, balancing stakeholder interests, and controlling fiscal risks within the social security domain.<n>DeepSeek-R1 demonstrates superior performance to GPT-4o across all evaluation dimensions in policy recommendation generation.
arXiv Detail & Related papers (2025-04-12T08:50:12Z) - Large language models in climate and sustainability policy: limits and opportunities [1.4843690728082002]
We apply different NLP techniques, tools and approaches to climate and sustainability documents to derive policy-relevant and actionable measures.<n>We find that the use of LLMs is successful at processing, classifying and summarizing heterogeneous text-based data.<n>Our work presents a critical but empirically grounded application of LLMs to complex policy problems and suggests avenues to further expand Artificial Intelligence-powered computational social sciences.
arXiv Detail & Related papers (2025-02-04T10:13:14Z) - Political-LLM: Large Language Models in Political Science [159.95299889946637]
Large language models (LLMs) have been widely adopted in political science tasks.<n>Political-LLM aims to advance the comprehensive understanding of integrating LLMs into computational political science.
arXiv Detail & Related papers (2024-12-09T08:47:50Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - AI Language Models Could Both Help and Harm Equity in Marine
Policymaking: The Case Study of the BBNJ Question-Answering Bot [3.643615070316831]
Large Language Models (LLMs) like ChatGPT are set to reshape some aspects of policymaking processes.
We are cautiously hopeful that LLMs could be used to promote a marginally more balanced footing among decision makers in policy negotiations.
However, the risks are particularly concerning for environmental and marine policy uses.
arXiv Detail & Related papers (2024-03-04T06:21:02Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.