Related papers: Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal Practitioners

Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal Practitioners

URL: http://arxiv.org/abs/2305.15533v1
Date: Wed, 24 May 2023 19:37:23 GMT
Title: Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal Practitioners
Authors: Claire Barale, Michael Rovatsos, Nehal Bhuta
Abstract summary: We introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases. We investigate an under-studied legal domain with a case study on refugee law in Canada.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases. We investigate an under-studied legal domain with a case study on refugee law in Canada. Searching case law for past similar cases is a key part of legal work for both lawyers and judges, the potential end-users of our prototype. While traditional named-entity recognition labels such as dates provide meaningful information in legal work, we propose to extend existing models and retrieve a total of 19 useful categories of items from refugee cases. After creating a novel data set of cases, we perform information extraction based on state-of-the-art neural named-entity recognition (NER). We test different architectures including two transformer models, using contextual and non-contextual embeddings, and compare general purpose versus domain-specific pre-training. The results demonstrate that models pre-trained on legal data perform best despite their smaller size, suggesting that domain matching had a larger effect than network architecture. We achieve a F1 score above 90% on five of the targeted categories and over 80% on four further categories.

Related papers

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation [5.243460995467895]
We present LEGAR BENCH, the first large-scale Korean Legal Case Retrieval benchmark, covering 411 diverse crime types in queries over 1.2M legal cases.<n>We also present LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases.
arXiv Detail & Related papers (2025-05-28T09:02:41Z)
AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios [47.83822985839837]
We present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases.<n>The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance.<n> Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario.
arXiv Detail & Related papers (2025-05-22T10:50:33Z)
A Reproducibility Study of Graph-Based Legal Case Retrieval [1.6819960041696331]
CaseLink is a graph-based method for legal case retrieval. CaseLink captures higher-order relationships of cases going beyond the stand-alone level of documents. Challenges in reproducing novel results have recently been highlighted.
arXiv Detail & Related papers (2025-04-11T10:04:12Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)
CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding [15.685369142294693]
CaseEncoder is a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.
arXiv Detail & Related papers (2023-05-09T12:40:19Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Attentive Deep Neural Networks for Legal Document Retrieval [2.4350217735794337]
We study the use of attentive neural network-based text representation for statute law document retrieval. We develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer. Experimental results show that Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages.
arXiv Detail & Related papers (2022-12-13T01:37:27Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset [2.0924876102146714]
We present a new dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case. We show that domain pretraining may be warranted when the task exhibits sufficient similarity to the pretraining corpus. Our findings inform when researchers should engage resource-intensive pretraining and show that Transformer-based architectures, too, learn embeddings suggestive of distinct legal language.
arXiv Detail & Related papers (2021-04-18T00:57:16Z)
Legal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service [6.696983725360808]
This paper proposes the use of NLP techniques for textual classification. Our main goal is to automate the process of assigning petitions to their respective areas of law. The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network architecture.
arXiv Detail & Related papers (2020-10-13T18:05:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.