Related papers: The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction

The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction

URL: http://arxiv.org/abs/2507.13732v1
Date: Fri, 18 Jul 2025 08:28:53 GMT
Title: The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction
Authors: Guillaume Zambrano,
Abstract summary: This study examines the role of human judges in legal decision-making.<n>It uses machine learning to predict child physical custody outcomes in French appellate courts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study examines the role of human judges in legal decision-making by using machine learning to predict child physical custody outcomes in French appellate courts. Building on the legal realism-formalism debate, we test whether individual judges' decision-making patterns significantly influence case outcomes, challenging the assumption that judges are neutral variables that apply the law uniformly. To ensure compliance with French privacy laws, we implement a strict pseudonymization process. Our analysis uses 18,937 living arrangements rulings extracted from 10,306 cases. We compare models trained on individual judges' past rulings (specialist models) with a judge-agnostic model trained on aggregated data (generalist models). The prediction pipeline is a hybrid approach combining large language models (LLMs) for structured feature extraction and ML models for outcome prediction (RF, XGB and SVC). Our results show that specialist models consistently achieve higher predictive accuracy than the general model, with top-performing models reaching F1 scores as high as 92.85%, compared to the generalist model's 82.63% trained on 20x to 100x more samples. Specialist models capture stable individual patterns that are not transferable to other judges. In-Domain and Cross-Domain validity tests provide empirical support for legal realism, demonstrating that judicial identity plays a measurable role in legal outcomes. All data and code used will be made available.

Related papers

The Silicon Reasonable Person: Can AI Predict How Ordinary People Judge Reasonableness? [0.0]
This Article investigates whether large language models (LLMs) can learn to identify patterns driving human reasonableness judgments.<n>We show that certain models capture not just surface-level responses but potentially their underlying decisional architecture.<n>These findings suggest practical applications: judges could calibrate intuitions against broader patterns, lawmakers could test policy interpretations, and resource-constrained litigants could preview argument reception.
arXiv Detail & Related papers (2025-08-04T06:19:45Z)
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks [11.01213914485374]
We study large language models (LLMs) on mathematical reasoning tasks.<n>Our analysis uncovers a strong correlation between judgment performance and the candidate model task performance.<n>As a consequence, we test whether we can predict the behavior of LLM judges using simple features such as part-of-speech tags.
arXiv Detail & Related papers (2024-09-06T10:09:41Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges [6.609843448260634]
The LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large language models.<n>This paper focuses on a clean scenario in which inter-human agreement is high.<n>We identify vulnerabilities in judge models, such as their sensitivity to prompt complexity and length, and a tendency toward leniency.
arXiv Detail & Related papers (2024-06-18T13:49:54Z)
Towards Explainability in Legal Outcome Prediction Models [64.00172507827499]
We argue that precedent is a natural way of facilitating explainability for legal NLP models. By developing a taxonomy of legal precedent, we are able to compare human judges and neural models. We find that while the models learn to predict outcomes reasonably well, their use of precedent is unlike that of human judges.
arXiv Detail & Related papers (2024-03-25T15:15:41Z)
Aligning Large Language Models by On-Policy Self-Judgment [49.31895979525054]
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. We present a novel alignment framework, SELF-JUDGE, that does on-policy learning and is parameter efficient. We show that the rejecting sampling by itself can improve performance further without an additional evaluator.
arXiv Detail & Related papers (2024-02-17T11:25:26Z)
Do Charge Prediction Models Learn Legal Theory? [59.74220430434435]
We argue that trustworthy charge prediction models should take legal theories into consideration. We propose three principles for trustworthy models should follow in this task, which are sensitive, selective, and presumption of innocence. Our findings indicate that, while existing charge prediction models meet the selective principle on a benchmark dataset, most of them are still not sensitive enough and do not satisfy the presumption of innocence.
arXiv Detail & Related papers (2022-10-31T07:32:12Z)
Accuracy, Fairness, and Interpretability of Machine Learning Criminal Recidivism Models [4.297070083645049]
Various machine learning-based criminal recidivism models are created based on a real-world parole decision dataset from the state of Georgia in the United States. It is found that there are noted differences and trade-offs between accuracy, fairness, and being inherently interpretable.
arXiv Detail & Related papers (2022-09-14T17:53:24Z)
Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications. In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence. Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z)
Predicting Indian Supreme Court Judgments, Decisions, Or Appeals [0.403831199243454]
We introduce our newly developed ML-enabled legal prediction model and its operational prototype, eLegPredict. eLegPredict is trained and tested over 3072 supreme court cases and has achieved 76% accuracy (F1-score) The eLegPredict is equipped with a mechanism to aid end users, where as soon as a document with new case description is dropped into a designated directory, the system quickly reads through its content and generates prediction.
arXiv Detail & Related papers (2021-09-28T18:28:43Z)
Equality before the Law: Legal Judgment Consistency Analysis for Fairness [55.91612739713396]
In this paper, we propose an evaluation metric for judgment inconsistency, Legal Inconsistency Coefficient (LInCo) We simulate judges from different groups with legal judgment prediction (LJP) models and measure the judicial inconsistency with the disagreement of the judgment results given by LJP models trained on different groups. We employ LInCo to explore the inconsistency in real cases and come to the following observations: (1) Both regional and gender inconsistency exist in the legal system, but gender inconsistency is much less than regional inconsistency.
arXiv Detail & Related papers (2021-03-25T14:28:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.