Benchmarking LLMs for Political Science: A United Nations Perspective
- URL: http://arxiv.org/abs/2502.14122v1
- Date: Wed, 19 Feb 2025 21:51:01 GMT
- Title: Benchmarking LLMs for Political Science: A United Nations Perspective
- Authors: Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu,
- Abstract summary: Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored.
This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process.
We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches.
- Score: 34.000742556609126
- License:
- Abstract: Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process--drafting, voting, and discussing--and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. This work contributes to the growing intersection of AI and political science, opening new avenues for research and practical applications in global governance. The UNBench Repository can be accessed at: https://github.com/yueqingliang1/UNBench.
Related papers
- Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach [14.32199539218175]
This paper proposes an adaptable Large Language Model (LLM)-driven online testing framework to explore critical and diverse testing scenarios.
Specifically, we design a "generate-test-feedback" pipeline with templated prompt engineering to harness the world knowledge and reasoning abilities of LLMs.
arXiv Detail & Related papers (2024-12-09T17:27:04Z) - Political-LLM: Large Language Models in Political Science [159.95299889946637]
Large language models (LLMs) have been widely adopted in political science tasks.
Political-LLM aims to advance the comprehensive understanding of integrating LLMs into computational political science.
arXiv Detail & Related papers (2024-12-09T08:47:50Z) - Large Language Models in Politics and Democracy: A Comprehensive Survey [0.0]
Large language models (LLMs) offer potential across various domains, including policymaking, political communication, analysis, and governance.
LLMs offer opportunities to enhance efficiency, inclusivity, and decision-making in political processes.
They also present challenges related to bias, transparency, and accountability.
arXiv Detail & Related papers (2024-12-01T15:23:34Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models [33.251235538905895]
This paper introduces a novel approach to evaluating presidential debate performances using large language models.
We propose a framework that analyzes candidates' "Policies, Persona, and Perspective" (3P) and how they resonate with the "Interests, Ideologies, and Identity" (3I) of four key audience groups.
Our method employs large language models to generate the LLM-POTUS Score, a quantitative measure of debate performance.
arXiv Detail & Related papers (2024-09-12T15:40:45Z) - Large language models can consistently generate high-quality content for election disinformation operations [2.98293101034582]
Large language models have raised concerns about their potential use in generating compelling election disinformation at scale.
This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation.
arXiv Detail & Related papers (2024-08-13T08:45:34Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions? [59.0123596591807]
We benchmark the ability of Large Language Models (LLMs) in persona-driven decision-making.
We investigate whether LLMs can predict characters' decisions provided by the preceding stories in high-quality novels.
The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet substantial room for improvement remains.
arXiv Detail & Related papers (2024-04-18T12:40:59Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.