Related papers: U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

URL: http://arxiv.org/abs/2307.05260v1
Date: Tue, 11 Jul 2023 13:51:12 GMT
Title: U-CREAT: Unsupervised Case Retrieval using Events extrAcTion
Authors: Abhinav Joshi and Akshat Sharma and Sai Kiran Tanikella and Ashutosh Modi
Abstract summary: We propose a new benchmark (in English) for the Prior Case Retrieval task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. We explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT. We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin.
Score: 2.2385755093672044
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).

Related papers

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation [56.79698529022327]
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution.<n>This paper explores the problem of legal claim generation based on the given case's facts.<n>We construct ClaimGen-CN, the first dataset for Chinese legal claim generation task.
arXiv Detail & Related papers (2025-08-24T07:19:25Z)
Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries [3.552993426200889]
TraceRetriever mirrors real-world legal search by operating with limited case information.<n>Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion.<n> Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments.
arXiv Detail & Related papers (2025-08-01T14:49:33Z)
Augmented Question-guided Retrieval (AQgR) of Indian Case Law with LLM, RAG, and Structured Summaries [0.0]
This paper proposes the use of Large Language Models (LLMs) to facilitate the retrieval of relevant cases.<n>Our approach combines Retrieval Augmented Generation (RAG) with structured summaries optimized for Indian case law.<n>The system generates targeted legal questions based on factual scenarios to identify relevant case law more effectively.
arXiv Detail & Related papers (2025-07-23T05:24:44Z)
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation [5.243460995467895]
We present LEGAR BENCH, the first large-scale Korean Legal Case Retrieval benchmark, covering 411 diverse crime types in queries over 1.2M legal cases.<n>We also present LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases.
arXiv Detail & Related papers (2025-05-28T09:02:41Z)
A Reasoning-Focused Legal Retrieval Benchmark [28.607778538115642]
We introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA.<n>Our results suggest that legal RAG remains a challenging application, thus motivating future research.
arXiv Detail & Related papers (2025-05-06T20:44:03Z)
A Reproducibility Study of Graph-Based Legal Case Retrieval [1.6819960041696331]
CaseLink is a graph-based method for legal case retrieval. CaseLink captures higher-order relationships of cases going beyond the stand-alone level of documents. Challenges in reproducing novel results have recently been highlighted.
arXiv Detail & Related papers (2025-04-11T10:04:12Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
ECtHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights [1.3723120574076126]
We develop a prior case retrieval dataset based on judgements from the European Court of Human Rights (ECtHR) We benchmark different lexical and dense retrieval approaches with various negative sampling strategies. We find that difficulty-based negative sampling strategies were not effective for the PCR task.
arXiv Detail & Related papers (2024-03-31T08:06:54Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
An Intent Taxonomy of Legal Case Retrieval [43.22489520922202]
Legal case retrieval is a special Information Retrieval(IR) task focusing on legal case documents. We present a novel hierarchical intent taxonomy of legal case retrieval. We reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval.
arXiv Detail & Related papers (2023-07-25T07:27:32Z)
Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal Practitioners [0.0]
We introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases. We investigate an under-studied legal domain with a case study on refugee law in Canada.
arXiv Detail & Related papers (2023-05-24T19:37:23Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations [6.40476282000118]
REG-IR is an application of document-to-document information retrieval. We show that fine-tuning a BERT model on an in-domain classification task produces the best representations for IR. We also show that neural re-rankers under-perform due to contradicting supervision, i.e., similar query-document pairs with opposite labels.
arXiv Detail & Related papers (2021-01-26T11:38:15Z)
Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet. We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.