Related papers: Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

URL: http://arxiv.org/abs/2512.01020v1
Date: Sun, 30 Nov 2025 18:32:43 GMT
Title: Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics
Authors: Jinu Lee, Kyoung-Woon On, Simeng Han, Arman Cohan, Julia Hockenmaier,
Abstract summary: We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset.<n>We convert court judgments into hierarchical trees of opposing parties' arguments and the court's conclusions, which serve as rubrics for evaluating the issue coverage and correctness of the reasoning traces.
Score: 49.3262123849242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating the quality of LLM-generated reasoning traces in expert domains (e.g., law) is essential for ensuring credibility and explainability, yet remains challenging due to the inherent complexity of such reasoning tasks. We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset with an emphasis on reasoning trace evaluation. We convert court judgments into hierarchical trees of opposing parties' arguments and the court's conclusions, which serve as rubrics for evaluating the issue coverage and correctness of the reasoning traces. We verify the reliability of these rubrics via human expert annotations and comparison with coarse, less informative rubrics. Using the LEGIT dataset, we show that (1) LLMs' legal reasoning ability is seriously affected by both legal issue coverage and correctness, and that (2) retrieval-augmented generation (RAG) and RL with rubrics bring complementary benefits for legal reasoning abilities, where RAG improves overall reasoning capability, whereas RL improves correctness albeit with reduced coverage.

Related papers

LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z)
Dissecting Judicial Reasoning in U.S. Copyright Damage Awards [0.21485350418225238]
judicial reasoning in copyright damage awards poses a core challenge for computational legal analysis.<n>Federal courts follow the 1976 Copyright Act, their interpretations and factor weightings vary widely across jurisdictions.<n>This research introduces a novel discourse-based Large Language Model (LLM) methodology that integrates Rhetorical Structure Theory (RST) with an agentic workflow.
arXiv Detail & Related papers (2026-01-14T13:09:16Z)
CLaw: Benchmarking Chinese Legal Knowledge in Large Language Models - A Fine-grained Corpus and Reasoning Analysis [13.067377421250557]
Large Language Models (LLMs) are increasingly tasked with analyzing legal texts and citing relevant statutes.<n>This paper introduces CLaw, a novel benchmark specifically engineered to meticulously evaluate LLMs on Chinese legal knowledge and its application in reasoning.
arXiv Detail & Related papers (2025-09-25T14:19:51Z)
ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation [56.79698529022327]
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution.<n>This paper explores the problem of legal claim generation based on the given case's facts.<n>We construct ClaimGen-CN, the first dataset for Chinese legal claim generation task.
arXiv Detail & Related papers (2025-08-24T07:19:25Z)
GLARE: Agentic Reasoning for Legal Judgment Prediction [60.13483016810707]
Legal judgment prediction (LJP) has become increasingly important in the legal field.<n>Existing large language models (LLMs) have significant problems of insufficient reasoning due to a lack of legal knowledge.<n>We introduce GLARE, an agentic legal reasoning framework that dynamically acquires key legal knowledge by invoking different modules.
arXiv Detail & Related papers (2025-08-22T13:38:12Z)
LegalReasoner: Step-wised Verification-Correction for Legal Judgment Reasoning [25.808321575139537]
Legal judgment prediction (LJP) aims to function as a judge by making final rulings based on case claims and facts.<n>We propose LegalReasoner, which enhances LJP reliability through step-wise verification and correction of the reasoning process.<n>We release the LegalHK dataset, containing 58,130 Hong Kong court cases with detailed annotations of dispute points, step-by-step reasoning chains, and process verification labels.
arXiv Detail & Related papers (2025-06-09T05:48:35Z)
RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models [58.69183479148083]
Legal Judgment Prediction (LJP) is a pivotal task in legal AI.<n>Existing LJP models integrate judicial precedents and legal knowledge for high performance.<n>But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis.<n>This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL)
arXiv Detail & Related papers (2025-05-27T14:50:21Z)
Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use [44.99833362998488]
This paper presents a domain-specific implementation of Retrieval-Augmented Generation tailored to the Fair Use Doctrine in U.S. copyright law.<n>Motivated by the increasing prevalence of DMCA takedowns and the lack of accessible legal support for content creators, we propose a structured approach that combines semantic search with legal knowledge graphs and court citation networks to improve retrieval quality and reasoning reliability.
arXiv Detail & Related papers (2025-05-04T15:53:49Z)
A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences [76.73731245899454]
We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience.<n>Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision.<n>This benchmark paves the way for transparent and accountable AI-assisted law reasoning in the Intelligent Court''
arXiv Detail & Related papers (2025-03-02T10:26:54Z)
Artificial Intelligence and Legal Analysis: Implications for Legal Education and the Profession [0.0]
This article reports the results of a study examining the ability of legal and nonlegal Large Language Models to perform legal analysis.<n>The results show that LLMs can conduct basic IRAC analysis, but are limited by brief responses lacking detail, an inability to commit to answers, false confidence, and hallucinations.
arXiv Detail & Related papers (2025-02-04T19:50:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.