Related papers: When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning

When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning

URL: http://arxiv.org/abs/2506.03913v1
Date: Wed, 04 Jun 2025 13:05:37 GMT
Title: When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning
Authors: Claire Barale, Michael Rovatsos, Nehal Bhuta,
Abstract summary: Legal decisions are increasingly evaluated for fairness, consistency, and bias using machine learning (ML) techniques.<n>In high-stakes domains like refugee adjudication, such methods are often applied to detect disparities in outcomes.<n>Yet it remains unclear whether statistical methods can meaningfully assess fairness in legal contexts shaped by discretion, normative complexity, and limited ground truth.
Score: 4.096453902709292
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Legal decisions are increasingly evaluated for fairness, consistency, and bias using machine learning (ML) techniques. In high-stakes domains like refugee adjudication, such methods are often applied to detect disparities in outcomes. Yet it remains unclear whether statistical methods can meaningfully assess fairness in legal contexts shaped by discretion, normative complexity, and limited ground truth. In this paper, we empirically evaluate three common ML approaches (feature-based analysis, semantic clustering, and predictive modeling) on a large, real-world dataset of 59,000+ Canadian refugee decisions (AsyLex). Our experiments show that these methods produce divergent and sometimes contradictory signals, that predictive modeling often depends on contextual and procedural features rather than legal features, and that semantic clustering fails to capture substantive legal reasoning. We show limitations of statistical fairness evaluation, challenge the assumption that statistical regularity equates to fairness, and argue that current computational approaches fall short of evaluating fairness in legally discretionary domains. We argue that evaluating fairness in law requires methods grounded not only in data, but in legal reasoning and institutional context.

Related papers

Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.<n>We derive estimators demographic parity, equal opportunity, and conditional mutual information.<n>To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z)
Rethinking Fair Representation Learning for Performance-Sensitive Tasks [19.40265690963578]
We use causal reasoning to define and formalise different sources of dataset bias.<n>We run experiments across a range of medical modalities to examine the performance of fair representation learning under distribution shifts.
arXiv Detail & Related papers (2024-10-05T11:01:16Z)
Parametric Fairness with Statistical Guarantees [0.46040036610482665]
We extend the concept of Demographic Parity to incorporate distributional properties in predictions, allowing expert knowledge to be used in the fair solution. We illustrate the use of this new metric through a practical example of wages, and develop a parametric method that efficiently addresses practical challenges.
arXiv Detail & Related papers (2023-10-31T14:52:39Z)
Compatibility of Fairness Metrics with EU Non-Discrimination Laws: Demographic Parity & Conditional Demographic Disparity [3.5607241839298878]
Empirical evidence suggests that algorithmic decisions driven by Machine Learning (ML) techniques threaten to discriminate against legally protected groups or create new sources of unfairness. This work aims at assessing up to what point we can assure legal fairness through fairness metrics and under fairness constraints. Our experiments and analysis suggest that AI-assisted decision-making can be fair from a legal perspective depending on the case at hand and the legal justification.
arXiv Detail & Related papers (2023-06-14T09:38:05Z)
Reconciling Predictive and Statistical Parity: A Causal Approach [68.59381759875734]
We propose a new causal decomposition formula for the fairness measures associated with predictive parity. We show that the notions of statistical and predictive parity are not really mutually exclusive, but complementary and spanning a spectrum of fairness notions.
arXiv Detail & Related papers (2023-06-08T09:23:22Z)
Individual Fairness under Uncertainty [26.183244654397477]
Algorithmic fairness is an established area in machine learning (ML) algorithms. We propose an individual fairness measure and a corresponding algorithm that deal with the challenges of uncertainty arising from censorship in class labels. We argue that this perspective represents a more realistic model of fairness research for real-world application deployment.
arXiv Detail & Related papers (2023-02-16T01:07:58Z)
Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law [2.959308758321417]
We present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between three fairness criteria.<n>We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector.
arXiv Detail & Related papers (2022-12-01T12:47:54Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z)
Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems [46.93320580613236]
We present a simple, yet effective method based on normalisation (FaiReg) for regression problems. We compare it with two standard methods for fairness, namely data balancing and adversarial training. The results show the superior performance of diminishing the effects of unfairness better than data balancing.
arXiv Detail & Related papers (2022-02-02T12:26:25Z)
Equality before the Law: Legal Judgment Consistency Analysis for Fairness [55.91612739713396]
In this paper, we propose an evaluation metric for judgment inconsistency, Legal Inconsistency Coefficient (LInCo) We simulate judges from different groups with legal judgment prediction (LJP) models and measure the judicial inconsistency with the disagreement of the judgment results given by LJP models trained on different groups. We employ LInCo to explore the inconsistency in real cases and come to the following observations: (1) Both regional and gender inconsistency exist in the legal system, but gender inconsistency is much less than regional inconsistency.
arXiv Detail & Related papers (2021-03-25T14:28:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.