Related papers: LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation

URL: http://arxiv.org/abs/2505.23832v1
Date: Wed, 28 May 2025 09:02:41 GMT
Title: LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation
Authors: Chaeeun Kim, Jinu Lee, Wonseok Hwang,
Abstract summary: We present LEGAR BENCH, the first large-scale Korean Legal Case Retrieval benchmark, covering 411 diverse crime types in queries over 1.2M legal cases.<n>We also present LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases.
Score: 5.243460995467895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Legal Case Retrieval (LCR), which retrieves relevant cases from a query case, is a fundamental task for legal professionals in research and decision-making. However, existing studies on LCR face two major limitations. First, they are evaluated on relatively small-scale retrieval corpora (e.g., 100-55K cases) and use a narrow range of criminal query types, which cannot sufficiently reflect the complexity of real-world legal retrieval scenarios. Second, their reliance on embedding-based or lexical matching methods often results in limited representations and legally irrelevant matches. To address these issues, we present: (1) LEGAR BENCH, the first large-scale Korean LCR benchmark, covering 411 diverse crime types in queries over 1.2M legal cases; and (2) LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases through constrained decoding. Experimental results show that LegalSearchLM outperforms baselines by 6-20% on LEGAR BENCH, achieving state-of-the-art performance. It also demonstrates strong generalization to out-of-domain cases, outperforming naive generative models trained on in-domain data by 15%.

Related papers

Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries [3.552993426200889]
TraceRetriever mirrors real-world legal search by operating with limited case information.<n>Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion.<n> Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments.
arXiv Detail & Related papers (2025-08-01T14:49:33Z)
AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios [47.83822985839837]
We present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases.<n>The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance.<n> Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario.
arXiv Detail & Related papers (2025-05-22T10:50:33Z)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases.<n>Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models.<n>Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching [31.378981566988063]
Legal case retrieval (LCR) aims to automatically scour for comparable legal cases based on a given query.<n>To address them, a daunting challenge is assessing the uniquely defined legal-rational similarity within the judicial domain.<n>We propose an end-to-end model named LCM-LAI to solve the above challenges.
arXiv Detail & Related papers (2025-02-25T15:29:07Z)
Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study [9.30538764385435]
Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored.<n>We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations.<n>We then conduct a systematic benchmarking across a range of solutions.<n>Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero.
arXiv Detail & Related papers (2024-12-09T07:46:14Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset [20.315416393247247]
We introduce LeCaRDv2, a large-scale Legal Case Retrieval dataset (version 2). It consists of 800 queries and 55,192 candidates extracted from 4.3 million criminal case documents. We enrich the existing relevance criteria by considering three key aspects: characterization, penalty, procedure. It's important to note that all cases in the dataset have been annotated by multiple legal experts specializing in criminal law.
arXiv Detail & Related papers (2023-10-26T17:32:55Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
U-CREAT: Unsupervised Case Retrieval using Events extrAcTion [2.2385755093672044]
We propose a new benchmark (in English) for the Prior Case Retrieval task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. We explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT. We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin.
arXiv Detail & Related papers (2023-07-11T13:51:12Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Legal Element-oriented Modeling with Multi-view Contrastive Learning for Legal Case Retrieval [3.909749182759558]
We propose an interaction-focused network for legal case retrieval with a multi-view contrastive learning objective. Case-view contrastive learning minimizes the hidden space distance between relevant legal case representations. We employ a legal element knowledge-aware indicator to detect legal elements of cases.
arXiv Detail & Related papers (2022-10-11T06:47:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.