Related papers: CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

URL: http://arxiv.org/abs/2305.05393v1
Date: Tue, 9 May 2023 12:40:19 GMT
Title: CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding
Authors: Yixiao Ma, Yueyue Wu, Weihang Su, Qingyao Ai, Yiqun Liu
Abstract summary: CaseEncoder is a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.
Score: 15.685369142294693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Legal case retrieval is a critical process for modern legal information systems. While recent studies have utilized pre-trained language models (PLMs) based on the general domain self-supervised pre-training paradigm to build models for legal case retrieval, there are limitations in using general domain PLMs as backbones. Specifically, these models may not fully capture the underlying legal features in legal case documents. To address this issue, we propose CaseEncoder, a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. In the data sampling phase, we enhance the quality of the training data by utilizing fine-grained law article information to guide the selection of positive and negative examples. In the pre-training phase, we design legal-specific pre-training tasks that align with the judging criteria of relevant legal cases. Based on these tasks, we introduce an innovative loss function called Biased Circle Loss to enhance the model's ability to recognize case relevance in fine grains. Experimental results on multiple benchmarks demonstrate that CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.

Related papers

UQLegalAI@COLIEE2025: Advancing Legal Case Retrieval with Large Language Models and Graph Neural Networks [26.294747463024017]
Legal case retrieval plays a pivotal role in the legal domain by facilitating the efficient identification of relevant cases.<n>The Competition on Legal Information Extraction and Entailment (COLIEE) is held annually, offering updated benchmark datasets for evaluation.<n>This paper presents a detailed description of CaseLink, the method employed by UQLegalAI, the second highest team in Task 1 of COLIEE 2025.
arXiv Detail & Related papers (2025-05-27T05:32:50Z)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models. Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval [18.058942674792604]
We propose a novel few-shot workflow tailored to the relevant judgment of legal cases. By comparing the relevance judgments of LLMs and human experts, we empirically show that we can obtain reliable relevance judgments.
arXiv Detail & Related papers (2024-03-27T09:46:56Z)
Towards Explainability in Legal Outcome Prediction Models [64.00172507827499]
We argue that precedent is a natural way of facilitating explainability for legal NLP models. By developing a taxonomy of legal precedent, we are able to compare human judges and neural models. We find that while the models learn to predict outcomes reasonably well, their use of precedent is unlike that of human judges.
arXiv Detail & Related papers (2024-03-25T15:15:41Z)
PILOT: Legal Case Outcome Prediction with Case Law [43.680862577060765]
We identify two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts.
arXiv Detail & Related papers (2024-01-28T21:18:05Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)
Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal Practitioners [0.0]
We introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases. We investigate an under-studied legal domain with a case study on refugee law in Canada.
arXiv Detail & Related papers (2023-05-24T19:37:23Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Legal Element-oriented Modeling with Multi-view Contrastive Learning for Legal Case Retrieval [3.909749182759558]
We propose an interaction-focused network for legal case retrieval with a multi-view contrastive learning objective. Case-view contrastive learning minimizes the hidden space distance between relevant legal case representations. We employ a legal element knowledge-aware indicator to detect legal elements of cases.
arXiv Detail & Related papers (2022-10-11T06:47:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.