Mining Legal Arguments to Study Judicial Formalism
- URL: http://arxiv.org/abs/2512.11374v1
- Date: Fri, 12 Dec 2025 08:37:53 GMT
- Title: Mining Legal Arguments to Study Judicial Formalism
- Authors: Tomáš Koref, Lena Held, Mahammad Namazov, Harun Kumru, Yassine Thlija, Christoph Burchard, Ivan Habernal,
- Abstract summary: This study refutes claims about formalistic judging in Central and Eastern Europe (CEE) by developing automated methods to detect and classify judicial reasoning.<n>We create the MADON dataset of 272 decisions from two Czech Supreme Courts with expert annotations of 9,183 paragraphs.<n>Our three-stage pipeline combining ModernBERT, Llama 3.1, and traditional feature-based machine learning achieves promising results for decision classification.
- Score: 7.685444048563301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Courts must justify their decisions, but systematically analyzing judicial reasoning at scale remains difficult. This study refutes claims about formalistic judging in Central and Eastern Europe (CEE) by developing automated methods to detect and classify judicial reasoning in Czech Supreme Courts' decisions using state-of-the-art natural language processing methods. We create the MADON dataset of 272 decisions from two Czech Supreme Courts with expert annotations of 9,183 paragraphs with eight argument types and holistic formalism labels for supervised training and evaluation. Using a corpus of 300k Czech court decisions, we adapt transformer LLMs for Czech legal domain by continued pretraining and experiment with methods to address dataset imbalance including asymmetric loss and class weighting. The best models successfully detect argumentative paragraphs (82.6\% macro-F1), classify traditional types of legal argument (77.5\% macro-F1), and classify decisions as formalistic/non-formalistic (83.2\% macro-F1). Our three-stage pipeline combining ModernBERT, Llama 3.1, and traditional feature-based machine learning achieves promising results for decision classification while reducing computational costs and increasing explainability. Empirically, we challenge prevailing narratives about CEE formalism. This work shows that legal argument mining enables reliable judicial philosophy classification and shows the potential of legal argument mining for other important tasks in computational legal studies. Our methodology is easily replicable across jurisdictions, and our entire pipeline, datasets, guidelines, models, and source codes are available at https://github.com/trusthlt/madon.
Related papers
- LawThinker: A Deep Research Legal Agent in Dynamic Environments [51.782293183431676]
LawThinker is an autonomous legal research agent.<n>It enforces verification as an atomic operation after every knowledge exploration step.<n>LawThinker achieves a 24% improvement over direct reasoning.
arXiv Detail & Related papers (2026-02-12T15:19:11Z) - LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z) - Dissecting Judicial Reasoning in U.S. Copyright Damage Awards [0.21485350418225238]
judicial reasoning in copyright damage awards poses a core challenge for computational legal analysis.<n>Federal courts follow the 1976 Copyright Act, their interpretations and factor weightings vary widely across jurisdictions.<n>This research introduces a novel discourse-based Large Language Model (LLM) methodology that integrates Rhetorical Structure Theory (RST) with an agentic workflow.
arXiv Detail & Related papers (2026-01-14T13:09:16Z) - ReaKase-8B: Legal Case Retrieval via Knowledge and Reasoning Representations with LLMs [37.688405624086315]
A novel ReaKase-8B framework is proposed to leverage extracted legal facts, legal issues, legal relation triplets and legal reasoning for effective legal case retrieval.<n>Experiments on two benchmark datasets from COLIEE 2022 and COLIEE 2023 demonstrate that our knowledge and reasoning augmented embeddings substantially improve retrieval performance.
arXiv Detail & Related papers (2025-10-30T06:35:36Z) - The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction [0.0]
This study examines the role of human judges in legal decision-making.<n>It uses machine learning to predict child physical custody outcomes in French appellate courts.
arXiv Detail & Related papers (2025-07-18T08:28:53Z) - RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models [58.69183479148083]
Legal Judgment Prediction (LJP) is a pivotal task in legal AI.<n>Existing LJP models integrate judicial precedents and legal knowledge for high performance.<n>But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis.<n>This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL)
arXiv Detail & Related papers (2025-05-27T14:50:21Z) - AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases.<n>Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models.<n>Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK
Case Law Dataset [0.0]
This study addresses the gap in the literature working with large legal corpora about how to isolate cases, in our case summary judgments, from a large corpus of UK court decisions.
We use the Cambridge Law Corpus of 356,011 UK court decisions and determine that the large language model achieves a weighted F1 score of 0.94 versus 0.78 for keywords.
We identify and extract 3,102 summary judgment cases, enabling us to map their distribution across various UK courts over a temporal span.
arXiv Detail & Related papers (2024-03-04T10:13:30Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Mining Legal Arguments in Court Decisions [43.09204050756282]
We develop a new annotation scheme for legal arguments in proceedings of the European Court of Human Rights.
Second, we compile and annotate a large corpus of 373 court decisions.
Third, we train an argument mining model that outperforms state-of-the-art models in the legal NLP domain.
arXiv Detail & Related papers (2022-08-12T08:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.