Related papers: LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset

LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset

URL: http://arxiv.org/abs/2310.17609v1
Date: Thu, 26 Oct 2023 17:32:55 GMT
Title: LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset
Authors: Haitao Li, Yunqiu Shao, Yueyue Wu, Qingyao Ai, Yixiao Ma, Yiqun Liu
Abstract summary: We introduce LeCaRDv2, a large-scale Legal Case Retrieval dataset (version 2). It consists of 800 queries and 55,192 candidates extracted from 4.3 million criminal case documents. We enrich the existing relevance criteria by considering three key aspects: characterization, penalty, procedure. It's important to note that all cases in the dataset have been annotated by multiple legal experts specializing in criminal law.
Score: 20.315416393247247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As an important component of intelligent legal systems, legal case retrieval plays a critical role in ensuring judicial justice and fairness. However, the development of legal case retrieval technologies in the Chinese legal system is restricted by three problems in existing datasets: limited data size, narrow definitions of legal relevance, and naive candidate pooling strategies used in data sampling. To alleviate these issues, we introduce LeCaRDv2, a large-scale Legal Case Retrieval Dataset (version 2). It consists of 800 queries and 55,192 candidates extracted from 4.3 million criminal case documents. To the best of our knowledge, LeCaRDv2 is one of the largest Chinese legal case retrieval datasets, providing extensive coverage of criminal charges. Additionally, we enrich the existing relevance criteria by considering three key aspects: characterization, penalty, procedure. This comprehensive criteria enriches the dataset and may provides a more holistic perspective. Furthermore, we propose a two-level candidate set pooling strategy that effectively identify potential candidates for each query case. It's important to note that all cases in the dataset have been annotated by multiple legal experts specializing in criminal law. Their expertise ensures the accuracy and reliability of the annotations. We evaluate several state-of-the-art retrieval models at LeCaRDv2, demonstrating that there is still significant room for improvement in legal case retrieval. The details of LeCaRDv2 can be found at the anonymous website https://github.com/anonymous1113243/LeCaRDv2.

Related papers

ASP2LJ : An Adversarial Self-Play Laywer Augmented Legal Judgment Framework [21.003203706712643]
Legal Judgment Prediction (LJP) aims to predict judicial outcomes, including relevant legal charge, terms, and fines.<n>Current datasets, derived from authentic cases, suffer from high human annotation costs and imbalanced distributions.<n>We propose an Adversarial Self-Play Lawyer Augmented Legal Judgment Framework, called ASP2LJ.<n>Our framework enables a judge to reference evolved lawyers' arguments, improving the objectivity, fairness, and rationality of judicial decisions.
arXiv Detail & Related papers (2025-06-11T06:55:40Z)
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation [5.243460995467895]
We present LEGAR BENCH, the first large-scale Korean Legal Case Retrieval benchmark, covering 411 diverse crime types in queries over 1.2M legal cases.<n>We also present LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases.
arXiv Detail & Related papers (2025-05-28T09:02:41Z)
AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios [47.83822985839837]
We present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases.<n>The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance.<n> Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario.
arXiv Detail & Related papers (2025-05-22T10:50:33Z)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models. Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation [22.85652668826498]
This paper introduces KELLER, a legal knowledge-guided case reformulation approach based on large language models (LLMs) By incorporating professional legal knowledge about crimes and law articles, we enable large language models to accurately reformulate the original legal case into concise sub-facts of crimes.
arXiv Detail & Related papers (2024-06-28T08:59:45Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
An Intent Taxonomy of Legal Case Retrieval [43.22489520922202]
Legal case retrieval is a special Information Retrieval(IR) task focusing on legal case documents. We present a novel hierarchical intent taxonomy of legal case retrieval. We reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval.
arXiv Detail & Related papers (2023-07-25T07:27:32Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Legal Element-oriented Modeling with Multi-view Contrastive Learning for Legal Case Retrieval [3.909749182759558]
We propose an interaction-focused network for legal case retrieval with a multi-view contrastive learning objective. Case-view contrastive learning minimizes the hidden space distance between relevant legal case representations. We employ a legal element knowledge-aware indicator to detect legal elements of cases.
arXiv Detail & Related papers (2022-10-11T06:47:23Z)
LEVEN: A Large-Scale Chinese Legal Event Detection Dataset [82.44096140591675]
We present LEVEN, a large-scale Chinese LEgal eVENt detection dataset, with 8,116 legal documents and 150,977 human-annotated event mentions in 108 event types. LEVEN is the largest Legal Event Detection dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of LED methods.
arXiv Detail & Related papers (2022-03-16T11:40:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.