Can Large Language Models Become Policy Refinement Partners? Evidence from China's Social Security Studies
- URL: http://arxiv.org/abs/2504.09137v3
- Date: Sun, 20 Apr 2025 10:05:59 GMT
- Title: Can Large Language Models Become Policy Refinement Partners? Evidence from China's Social Security Studies
- Authors: Jinghan Ke, Zheng Zhou, Yuxuan Zhao,
- Abstract summary: This study investigates the capability boundaries and performance characteristics of large language models (LLMs) in generating policy recommendations for China's social security issues.<n>LLMs face significant limitations in addressing complex social dynamics, balancing stakeholder interests, and controlling fiscal risks within the social security domain.<n>DeepSeek-R1 demonstrates superior performance to GPT-4o across all evaluation dimensions in policy recommendation generation.
- Score: 10.816677663320547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid development of large language models (LLMs) is reshaping operational paradigms across multidisciplinary domains. LLMs' emergent capability to synthesize policy-relevant insights across disciplinary boundaries suggests potential as decision-support tools. However, their actual performance and suitability as policy refinement partners still require verification through rigorous and systematic evaluations. Our study employs the context-embedded generation-adaptation framework to conduct a tripartite comparison among the American GPT-4o, the Chinese DeepSeek-R1 and human researchers, investigating the capability boundaries and performance characteristics of LLMs in generating policy recommendations for China's social security issues. This study demonstrates that while LLMs exhibit distinct advantages in systematic policy design, they face significant limitations in addressing complex social dynamics, balancing stakeholder interests, and controlling fiscal risks within the social security domain. Furthermore, DeepSeek-R1 demonstrates superior performance to GPT-4o across all evaluation dimensions in policy recommendation generation, illustrating the potential of localized training to improve contextual alignment. These findings suggest that regionally-adapted LLMs can function as supplementary tools for generating diverse policy alternatives informed by domain-specific social insights. Nevertheless, the formulation of policy refinement requires integration with human researchers' expertise, which remains critical for interpreting institutional frameworks, cultural norms, and value systems.
Related papers
- Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces [12.671657542087624]
Policy Reasoning Traces (PRT) is a form of specialized generated reasoning chains that serve as a reasoning bridge to improve an LLM's policy compliance assessment capabilities.<n>Our empirical evaluations demonstrate that the use of PRTs for both inference-time and training-time scenarios significantly enhances the performance of open-weight and commercial models.
arXiv Detail & Related papers (2025-09-27T13:10:21Z) - What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models [13.022045946656661]
This article evaluates whether large language models (LLMs) are aligned with domain experts to inform social policymaking on the subject of homelessness alleviation.<n>We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies.<n>We present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios.
arXiv Detail & Related papers (2025-09-04T02:28:58Z) - Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges [47.14342587731284]
This survey provides a comprehensive overview of alignment techniques, training protocols, and empirical findings in large language models (LLMs) alignment.<n>We analyze the development of alignment methods across diverse paradigms, characterizing the fundamental trade-offs between core alignment objectives.<n>We discuss state-of-the-art techniques, including Direct Preference Optimization (DPO), Constitutional AI, brain-inspired methods, and alignment uncertainty quantification (AUQ)
arXiv Detail & Related papers (2025-07-25T20:52:58Z) - The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs [57.1838332916627]
Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP)<n>Their widespread deployment has also raised significant safety concerns.<n>LLMs-generated content can exhibit unsafe behaviors such as toxicity, bias, or misinformation, especially in adversarial contexts.
arXiv Detail & Related papers (2025-06-06T05:50:50Z) - Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning [53.9544543607396]
We propose a novel framework that integrates reward rendering with Imitation from Observation (IfO)
By instantiating F-distance in different ways, we derive two theoretical analysis and develop a practical algorithm called Accessible State Oriented Policy Regularization (ASOR)
ASOR serves as a general add-on module that can be incorporated into various approaches RL, including offline RL and off-policy RL.
arXiv Detail & Related papers (2025-03-10T03:50:20Z) - A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms.
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Between Innovation and Oversight: A Cross-Regional Study of AI Risk Management Frameworks in the EU, U.S., UK, and China [0.0]
This paper conducts a comparative analysis of AI risk management strategies across the European Union, United States, United Kingdom (UK), and China.
The findings show that the EU implements a structured, risk-based framework that prioritizes transparency and conformity assessments.
The U.S. uses a decentralized, sector-specific regulations that promote innovation but may lead to fragmented enforcement.
arXiv Detail & Related papers (2025-02-25T18:52:17Z) - Large Language Model Safety: A Holistic Survey [35.42419096859496]
The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence.<n>This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks.
arXiv Detail & Related papers (2024-12-23T16:11:27Z) - Safe Multi-agent Reinforcement Learning with Natural Language Constraints [49.01100552946231]
The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked.
We propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL)
Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings.
These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards.
arXiv Detail & Related papers (2024-05-30T12:57:35Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models.
It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence [5.147767778946168]
We critically assess 23 state-of-the-art Large Language Models (LLMs) benchmarks.
Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, diversity, and the overlooking of cultural and ideological norms.
arXiv Detail & Related papers (2024-02-15T11:08:10Z) - Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review [2.780460221321639]
The paper underscores the significance of Large Language Models in reshaping recommender systems.
LLMs exhibit exceptional proficiency in recommending items, showcasing their adeptness in comprehending intricacies of language.
Despite their transformative potential, challenges persist, including sensitivity to input prompts, occasional misinterpretations, and unforeseen recommendations.
arXiv Detail & Related papers (2024-02-11T00:24:17Z) - Counterfactual Explanation Policies in RL [3.674863913115432]
COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations.
We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
arXiv Detail & Related papers (2023-07-25T01:14:56Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.