Related papers: The Silicon Reasonable Person: Can AI Predict How Ordinary People Judge Reasonableness?

The Silicon Reasonable Person: Can AI Predict How Ordinary People Judge Reasonableness?

URL: http://arxiv.org/abs/2508.02766v1
Date: Mon, 04 Aug 2025 06:19:45 GMT
Title: The Silicon Reasonable Person: Can AI Predict How Ordinary People Judge Reasonableness?
Authors: Yonathan A. Arbel,
Abstract summary: This Article investigates whether large language models (LLMs) can learn to identify patterns driving human reasonableness judgments.<n>We show that certain models capture not just surface-level responses but potentially their underlying decisional architecture.<n>These findings suggest practical applications: judges could calibrate intuitions against broader patterns, lawmakers could test policy interpretations, and resource-constrained litigants could preview argument reception.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In everyday life, people make countless reasonableness judgments that determine appropriate behavior in various contexts. Predicting these judgments challenges the legal system, as judges' intuitions may not align with broader societal views. This Article investigates whether large language models (LLMs) can learn to identify patterns driving human reasonableness judgments. Using randomized controlled trials comparing humans and models across multiple legal contexts with over 10,000 simulated judgments, we demonstrate that certain models capture not just surface-level responses but potentially their underlying decisional architecture. Strikingly, these systems prioritize social cues over economic efficiency in negligence determinations, mirroring human behavior despite contradicting textbook treatments. These findings suggest practical applications: judges could calibrate intuitions against broader patterns, lawmakers could test policy interpretations, and resource-constrained litigants could preview argument reception. As AI agents increasingly make autonomous real-world decisions, understanding whether they've internalized recognizable ethical frameworks becomes essential for anticipating their behavior.

Related papers

AI Debate Aids Assessment of Controversial Claims [86.47978525513236]
We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial COVID-19 factuality claims.<n>In our human study, we find that debate-where two AI advisor systems present opposing evidence-based arguments-consistently improves judgment accuracy and confidence calibration.<n>In our AI judge study, we find that AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%)
arXiv Detail & Related papers (2025-06-02T19:01:53Z)
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals [0.0]
This paper examines whether model-based evaluators assess refusal responses differently than human users.<n>We find that LLM-as-a-Judge systems evaluate ethical refusals significantly more favorably than human users.
arXiv Detail & Related papers (2025-05-21T10:56:16Z)
Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment [0.0]
Large language models (LLMs) are evolving into agentic AI systems, but their decision-making processes remain poorly understood.<n>We show that even LLMs that excel at reasoning deviate significantly from human judgments because they adhere strictly to policies.<n>We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning.
arXiv Detail & Related papers (2025-03-04T20:00:37Z)
On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z)
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges [6.609843448260634]
The LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large language models.<n>This paper focuses on a clean scenario in which inter-human agreement is high.<n>We identify vulnerabilities in judge models, such as their sensitivity to prompt complexity and length, and a tendency toward leniency.
arXiv Detail & Related papers (2024-06-18T13:49:54Z)
The Reasonable Person Standard for AI [0.0]
The American legal system often uses the "Reasonable Person Standard" This paper argues that the reasonable person standard provides useful guidelines for the type of behavior we should develop, probe, and stress-test in models.
arXiv Detail & Related papers (2024-06-07T06:35:54Z)
The Ethics of Automating Legal Actors [58.81546227716182]
We argue that automating the role of the judge raises difficult ethical challenges, in particular for common law legal systems. Our argument follows from the social role of the judge in actively shaping the law, rather than merely applying it. Even in the case the models could achieve human-level capabilities, there would still be remaining ethical concerns inherent in the automation of the legal process.
arXiv Detail & Related papers (2023-12-01T13:48:46Z)
Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments. It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Randomized Classifiers vs Human Decision-Makers: Trustworthy AI May Have to Act Randomly and Society Seems to Accept This [0.8889304968879161]
We feel that akin to human decisions, judgments of artificial agents should necessarily be grounded in some moral principles. Yet a decision-maker can only make truly ethical (based on any ethical theory) and fair (according to any notion of fairness) decisions if full information on all the relevant factors on which the decision is based are available at the time of decision-making.
arXiv Detail & Related papers (2021-11-15T05:39:02Z)
Indecision Modeling [50.00689136829134]
It is important that AI systems act in ways which align with human values. People are often indecisive, and especially so when their decision has moral implications.
arXiv Detail & Related papers (2020-12-15T18:32:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.