CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation
- URL: http://arxiv.org/abs/2502.17943v1
- Date: Tue, 25 Feb 2025 08:03:32 GMT
- Title: CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation
- Authors: Haitao Li, Jiaying Ye, Yiran Hu, Jia Chen, Qingyao Ai, Yueyue Wu, Junjie Chen, Yifan Chen, Cheng Luo, Quan Zhou, Yiqun Liu,
- Abstract summary: We introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain.<n>CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections.<n>It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results.
- Score: 22.98779736851499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Legal case documents play a critical role in judicial proceedings. As the number of cases continues to rise, the reliance on manual drafting of legal case documents is facing increasing pressure and challenges. The development of large language models (LLMs) offers a promising solution for automating document generation. However, existing benchmarks fail to fully capture the complexities involved in drafting legal case documents in real-world scenarios. To address this gap, we introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain. CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections. It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results. To the best of our knowledge, CaseGen is the first benchmark designed to evaluate LLMs in the context of legal case document generation. To ensure an accurate and comprehensive evaluation, we design the LLM-as-a-judge evaluation framework and validate its effectiveness through human annotations. We evaluate several widely used general-domain LLMs and legal-specific LLMs, highlighting their limitations in case document generation and pinpointing areas for potential improvement. This work marks a step toward a more effective framework for automating legal case documents drafting, paving the way for the reliable application of AI in the legal field. The dataset and code are publicly available at https://github.com/CSHaitao/CaseGen.
Related papers
- Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej [5.790242888372048]
We introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents.
We develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts.
arXiv Detail & Related papers (2025-04-04T14:41:50Z) - JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System [12.256518096712334]
JuDGE (Judgment Document Generation Evaluation) is a novel benchmark for evaluating the performance of judgment document generation in the Chinese legal system.
We construct a comprehensive dataset consisting of factual descriptions from real legal cases, paired with their corresponding full judgment documents.
In collaboration with legal professionals, we establish a comprehensive automated evaluation framework to assess the quality of generated judgment documents.
arXiv Detail & Related papers (2025-03-18T13:48:18Z) - AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases.
Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models.
Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z) - LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.<n>LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human [0.6827423171182154]
Large Language Models (LLMs) enable a future in which certain types of legal documents may be generated automatically.
This study is the necessary analysis of the ongoing transition towards mature generative AI systems.
Our analysis revealed a clear preference for documents perceived as crafted by a human over those believed to be generated by AI.
arXiv Detail & Related papers (2024-07-09T12:11:25Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case
Encoding [15.685369142294693]
CaseEncoder is a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases.
CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.
arXiv Detail & Related papers (2023-05-09T12:40:19Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Legal Case Document Summarization: Extractive and Abstractive Methods
and their Evaluation [11.502115682980559]
Summarization of legal case judgement documents is a challenging problem in Legal NLP.
Not much analyses exist on how different families of summarization models perform when applied to legal case documents.
arXiv Detail & Related papers (2022-10-14T05:43:08Z) - Incorporating Domain Knowledge for Extractive Summarization of Legal
Case Documents [7.6340456946456605]
We propose an unsupervised summarization algorithm DELSumm for summarizing legal case documents.
Our proposed algorithm outperforms several supervised summarization models that are trained over thousands of document-summary pairs.
arXiv Detail & Related papers (2021-06-30T08:06:15Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.