Related papers: AppellateGen: A Benchmark for Appellate Legal Judgment Generation

AppellateGen: A Benchmark for Appellate Legal Judgment Generation

URL: http://arxiv.org/abs/2601.01331v2
Date: Thu, 08 Jan 2026 04:49:41 GMT
Title: AppellateGen: A Benchmark for Appellate Legal Judgment Generation
Authors: Hongkun Yang, Lionel Z. Wang, Wei Fan, Yiran Hu, Lixu Wang, Chenyu Liu, Shenghong Fu, Haoyang Li, Xin Xu, Jiexin Zheng, Wei Dong,
Abstract summary: We introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs.<n>The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates.<n>We propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting.
Score: 30.9030336647868
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on static fact-to-verdict mappings while neglecting the dialectical nature of appellate (second-instance) review. To address this, we introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs. The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates, thereby modeling the causal dependency between trial stages. We further propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial workflows, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting. Experimental results indicate that while SLMAS improves logical consistency, the complexity of appellate reasoning remains a substantial challenge for current LLMs. The dataset and code are publicly available at: https://anonymous.4open.science/r/AppellateGen-5763.

Related papers

LawThinker: A Deep Research Legal Agent in Dynamic Environments [51.782293183431676]
LawThinker is an autonomous legal research agent.<n>It enforces verification as an atomic operation after every knowledge exploration step.<n>LawThinker achieves a 24% improvement over direct reasoning.
arXiv Detail & Related papers (2026-02-12T15:19:11Z)
LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z)
CaseFacts: A Benchmark for Legal Fact-Checking and Precedent Retrieval [5.305110876082343]
CaseFacts is a benchmark for verifying legal claims against U.S. Supreme Court precedents.<n>The dataset consists of 6,294 claims categorized as Supported, Refuted, or Overruled.
arXiv Detail & Related papers (2026-01-23T23:41:46Z)
Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics [30.232667436008978]
We present a hybrid legal QA agent tailored for judicial settings.<n>It integrates retrieval-augmented generation (RAG) with multi-model ensembling to deliver reliable, auditable, and continuously updatable counsel.
arXiv Detail & Related papers (2025-11-03T15:30:58Z)
ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation [56.79698529022327]
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution.<n>This paper explores the problem of legal claim generation based on the given case's facts.<n>We construct ClaimGen-CN, the first dataset for Chinese legal claim generation task.
arXiv Detail & Related papers (2025-08-24T07:19:25Z)
ASP2LJ : An Adversarial Self-Play Laywer Augmented Legal Judgment Framework [21.003203706712643]
Legal Judgment Prediction (LJP) aims to predict judicial outcomes, including relevant legal charge, terms, and fines.<n>Current datasets, derived from authentic cases, suffer from high human annotation costs and imbalanced distributions.<n>We propose an Adversarial Self-Play Lawyer Augmented Legal Judgment Framework, called ASP2LJ.<n>Our framework enables a judge to reference evolved lawyers' arguments, improving the objectivity, fairness, and rationality of judicial decisions.
arXiv Detail & Related papers (2025-06-11T06:55:40Z)
RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models [58.69183479148083]
Legal Judgment Prediction (LJP) is a pivotal task in legal AI.<n>Existing LJP models integrate judicial precedents and legal knowledge for high performance.<n>But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis.<n>This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL)
arXiv Detail & Related papers (2025-05-27T14:50:21Z)
A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences [76.73731245899454]
We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience.<n>Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision.<n>This benchmark paves the way for transparent and accountable AI-assisted law reasoning in the Intelligent Court''
arXiv Detail & Related papers (2025-03-02T10:26:54Z)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases.<n>Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models.<n>Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval [16.29803062332164]
We propose a few-shot approach where large language models assist in generating expert-aligned relevance judgments.<n>The proposed approach decomposes the judgment process into several stages, mimicking the workflow of human annotators.<n>It also ensures interpretable data labeling, providing transparency and clarity in the relevance assessment process.
arXiv Detail & Related papers (2024-03-27T09:46:56Z)
Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning [49.23103067844278]
We propose the task of multi-defendant LJP, which aims to automatically predict the judgment results for each defendant of multi-defendant cases. Two challenges arise with the task of multi-defendant LJP: (1) indistinguishable judgment results among various defendants; and (2) the lack of a real-world dataset for training and evaluation.
arXiv Detail & Related papers (2023-12-10T04:46:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.