Related papers: Street-Level AI: Are Large Language Models Ready for Real-World Judgments?

Street-Level AI: Are Large Language Models Ready for Real-World Judgments?

URL: http://arxiv.org/abs/2508.08193v2
Date: Thu, 04 Sep 2025 14:42:06 GMT
Title: Street-Level AI: Are Large Language Models Ready for Real-World Judgments?
Authors: Gaurab Pokharel, Shafkat Farabi, Patrick J. Fowler, Sanmay Das,
Abstract summary: Most immediate and likely use of AI is to help or fully replace the so-called street-level bureaucrats.<n>In this paper, we examine how well LLM judgments align with human judgments.<n>We find that LLM prioritizations are extremely inconsistent in several ways.
Score: 10.76443470676701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A surge of recent work explores the ethical and societal implications of large-scale AI models that make "moral" judgments. Much of this literature focuses either on alignment with human judgments through various thought experiments or on the group fairness implications of AI judgments. However, the most immediate and likely use of AI is to help or fully replace the so-called street-level bureaucrats, the individuals deciding to allocate scarce social resources or approve benefits. There is a rich history underlying how principles of local justice determine how society decides on prioritization mechanisms in such domains. In this paper, we examine how well LLM judgments align with human judgments, as well as with socially and politically determined vulnerability scoring systems currently used in the domain of homelessness resource allocation. Crucially, we use real data on those needing services (maintaining strict confidentiality by only using local large models) to perform our analyses. We find that LLM prioritizations are extremely inconsistent in several ways: internally on different runs, between different LLMs, and between LLMs and the vulnerability scoring systems. At the same time, LLMs demonstrate qualitative consistency with lay human judgments in pairwise testing. Findings call into question the readiness of current generation AI systems for naive integration in high-stakes societal decision-making.

Related papers

Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare [87.06241096619112]
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare.<n>We introduce the Social Welfare Function Benchmark, a dynamic simulation environment where an LLM acts as a sovereign allocator.<n>We evaluate 20 state-of-the-art LLMs and present the first leaderboard for social welfare allocation.
arXiv Detail & Related papers (2025-10-01T17:52:31Z)
Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm [0.0]
We introduce DECIDE-SIM, a novel simulation framework that evaluates Large Language Models (LLMs) in multi-agent survival scenarios.<n>Our comprehensive evaluation of 11 LLMs reveals a striking heterogeneity in their ethical conduct, highlighting a critical misalignment with human-centric values.<n>We introduce an Ethical Self-Regulation System (ESRS) that models internal affective states of guilt and satisfaction as a feedback mechanism.
arXiv Detail & Related papers (2025-09-15T17:53:11Z)
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants [5.4831302830611195]
We develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods.<n>We develop HumanBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases.
arXiv Detail & Related papers (2025-09-10T11:10:10Z)
Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games [87.5673042805229]
How large language models balance self-interest and collective well-being is a critical challenge for ensuring alignment, robustness, and safe deployment.<n>We adapt a public goods game with institutional choice from behavioral economics, allowing us to observe how different LLMs navigate social dilemmas.<n>Surprisingly, we find that reasoning LLMs, such as the o1 series, struggle significantly with cooperation.
arXiv Detail & Related papers (2025-06-29T15:02:47Z)
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values [13.798198972161657]
A number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes.<n>This paper examines whether large language models (LLMs) adhere to fundamental fairness concepts and investigate their alignment with human preferences.
arXiv Detail & Related papers (2025-02-01T04:24:47Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Can We Trust AI Agents? A Case Study of an LLM-Based Multi-Agent System for Ethical AI [10.084913433923566]
AI-based systems impact millions by supporting diverse tasks but face issues like misinformation, bias, and misuse.<n>This study examines the use of Large Language Models (LLM) for AI ethics in practice.<n>We design a prototype, where agents engage in structured discussions on real-world AI ethics issues from the AI Incident Database.
arXiv Detail & Related papers (2024-10-25T20:17:59Z)
Investigating Context Effects in Similarity Judgements in Large Language Models [6.421776078858197]
Large Language Models (LLMs) have revolutionised the capability of AI models in comprehending and generating natural language text. We report an ongoing investigation on alignment of LLMs with human judgements affected by order bias.
arXiv Detail & Related papers (2024-08-20T10:26:02Z)
Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto [3.7414804164475983]
Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents.<n>We provide a systematization of existing approaches to the problem of introducing morality in machines - modelled as a continuum.<n>We argue that more hybrid solutions are needed to create adaptable and robust, yet controllable and interpretable agentic systems.
arXiv Detail & Related papers (2023-12-04T11:46:34Z)
Evaluating and Improving Value Judgments in AI: A Scenario-Based Study on Large Language Models' Depiction of Social Conventions [5.457150493905063]
We evaluate how contemporary AI services competitively meet user needs, then examined society's depiction as mirrored by Large Language Models. We suggest a model of decision-making in value-conflicting scenarios which could be adopted for future machine value judgments. This paper advocates for a practical approach to using AI as a tool for investigating other remote worlds.
arXiv Detail & Related papers (2023-10-04T08:42:02Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments. It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z)
Causal Fairness Analysis [68.12191782657437]
We introduce a framework for understanding, modeling, and possibly solving issues of fairness in decision-making settings. The main insight of our approach will be to link the quantification of the disparities present on the observed data with the underlying, and often unobserved, collection of causal mechanisms. Our effort culminates in the Fairness Map, which is the first systematic attempt to organize and explain the relationship between different criteria found in the literature.
arXiv Detail & Related papers (2022-07-23T01:06:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.