U-CREAT: Unsupervised Case Retrieval using Events extrAcTion
- URL: http://arxiv.org/abs/2307.05260v1
- Date: Tue, 11 Jul 2023 13:51:12 GMT
- Title: U-CREAT: Unsupervised Case Retrieval using Events extrAcTion
- Authors: Abhinav Joshi and Akshat Sharma and Sai Kiran Tanikella and Ashutosh
Modi
- Abstract summary: We propose a new benchmark (in English) for the Prior Case Retrieval task: IL-PCR (Indian Legal Prior Case Retrieval) corpus.
We explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT.
We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin.
- Score: 2.2385755093672044
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The task of Prior Case Retrieval (PCR) in the legal domain is about
automatically citing relevant (based on facts and precedence) prior legal cases
in a given query case. To further promote research in PCR, in this paper, we
propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian
Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance
and the long size of legal documents, BM25 remains a strong baseline for
ranking the cited prior documents. In this work, we explore the role of events
in legal case retrieval and propose an unsupervised retrieval method-based
pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find
that the proposed unsupervised retrieval method significantly increases
performance compared to BM25 and makes retrieval faster by a considerable
margin, making it applicable to real-time case retrieval systems. Our proposed
system is generic, we show that it generalizes across two different legal
systems (Indian and Canadian), and it shows state-of-the-art performance on the
benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description.
Existing works mainly focus on case-to-case retrieval using lengthy queries.
Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z) - LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain.
LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP)
We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z) - ECtHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights [1.3723120574076126]
We develop a prior case retrieval dataset based on judgements from the European Court of Human Rights (ECtHR)
We benchmark different lexical and dense retrieval approaches with various negative sampling strategies.
We find that difficulty-based negative sampling strategies were not effective for the PCR task.
arXiv Detail & Related papers (2024-03-31T08:06:54Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - An Intent Taxonomy of Legal Case Retrieval [43.22489520922202]
Legal case retrieval is a special Information Retrieval(IR) task focusing on legal case documents.
We present a novel hierarchical intent taxonomy of legal case retrieval.
We reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval.
arXiv Detail & Related papers (2023-07-25T07:27:32Z) - Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal
Practitioners [0.0]
We introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases.
We investigate an under-studied legal domain with a case study on refugee law in Canada.
arXiv Detail & Related papers (2023-05-24T19:37:23Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Regulatory Compliance through Doc2Doc Information Retrieval: A case
study in EU/UK legislation where text similarity has limitations [6.40476282000118]
REG-IR is an application of document-to-document information retrieval.
We show that fine-tuning a BERT model on an in-domain classification task produces the best representations for IR.
We also show that neural re-rankers under-perform due to contradicting supervision, i.e., similar query-document pairs with opposite labels.
arXiv Detail & Related papers (2021-01-26T11:38:15Z) - Generalizing Cross-Document Event Coreference Resolution Across Multiple
Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents.
CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet.
We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.