JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System
- URL: http://arxiv.org/abs/2503.14258v2
- Date: Thu, 20 Mar 2025 15:09:51 GMT
- Title: JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System
- Authors: Weihang Su, Baoqing Yue, Qingyao Ai, Yiran Hu, Jiaqi Li, Changyue Wang, Kaiyuan Zhang, Yueyue Wu, Yiqun Liu,
- Abstract summary: JuDGE (Judgment Document Generation Evaluation) is a novel benchmark for evaluating the performance of judgment document generation in the Chinese legal system.<n>We construct a comprehensive dataset consisting of factual descriptions from real legal cases, paired with their corresponding full judgment documents.<n>In collaboration with legal professionals, we establish a comprehensive automated evaluation framework to assess the quality of generated judgment documents.
- Score: 12.256518096712334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces JuDGE (Judgment Document Generation Evaluation), a novel benchmark for evaluating the performance of judgment document generation in the Chinese legal system. We define the task as generating a complete legal judgment document from the given factual description of the case. To facilitate this benchmark, we construct a comprehensive dataset consisting of factual descriptions from real legal cases, paired with their corresponding full judgment documents, which serve as the ground truth for evaluating the quality of generated documents. This dataset is further augmented by two external legal corpora that provide additional legal knowledge for the task: one comprising statutes and regulations, and the other consisting of a large collection of past judgment documents. In collaboration with legal professionals, we establish a comprehensive automated evaluation framework to assess the quality of generated judgment documents across various dimensions. We evaluate various baseline approaches, including few-shot in-context learning, fine-tuning, and a multi-source retrieval-augmented generation (RAG) approach, using both general and legal-domain LLMs. The experimental results demonstrate that, while RAG approaches can effectively improve performance in this task, there is still substantial room for further improvement. All the codes and datasets are available at: https://github.com/oneal2000/JuDGE.
Related papers
- Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej [5.790242888372048]
We introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents.
We develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts.
arXiv Detail & Related papers (2025-04-04T14:41:50Z) - CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation [22.98779736851499]
We introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain.<n>CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections.<n>It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results.
arXiv Detail & Related papers (2025-02-25T08:03:32Z) - Named entity recognition for Serbian legal documents: Design, methodology and dataset development [0.0]
We present one solution for Named Entity Recognition (NER) in the case of legal documents written in Serbian language.<n>It leverages on the pre-trained bidirectional encoder representations from transformers (BERT), which had been carefully adapted to the specific task of identifying and classifying specific data points from textual content.
arXiv Detail & Related papers (2025-02-14T22:23:39Z) - LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.
LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents [0.5812284760539713]
We define this problem as "scarce annotated legal documents"
We propose a deep-learning-based classification framework which we call MESc.
We also propose an explanation extraction algorithm named ORSE.
arXiv Detail & Related papers (2023-09-19T12:18:28Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Incorporating Domain Knowledge for Extractive Summarization of Legal
Case Documents [7.6340456946456605]
We propose an unsupervised summarization algorithm DELSumm for summarizing legal case documents.
Our proposed algorithm outperforms several supervised summarization models that are trained over thousands of document-summary pairs.
arXiv Detail & Related papers (2021-06-30T08:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.