Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal
Practitioners
- URL: http://arxiv.org/abs/2305.15533v1
- Date: Wed, 24 May 2023 19:37:23 GMT
- Title: Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal
Practitioners
- Authors: Claire Barale, Michael Rovatsos, Nehal Bhuta
- Abstract summary: We introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases.
We investigate an under-studied legal domain with a case study on refugee law in Canada.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we introduce an end-to-end pipeline for retrieving,
processing, and extracting targeted information from legal cases. We
investigate an under-studied legal domain with a case study on refugee law in
Canada. Searching case law for past similar cases is a key part of legal work
for both lawyers and judges, the potential end-users of our prototype. While
traditional named-entity recognition labels such as dates provide meaningful
information in legal work, we propose to extend existing models and retrieve a
total of 19 useful categories of items from refugee cases. After creating a
novel data set of cases, we perform information extraction based on
state-of-the-art neural named-entity recognition (NER). We test different
architectures including two transformer models, using contextual and
non-contextual embeddings, and compare general purpose versus domain-specific
pre-training. The results demonstrate that models pre-trained on legal data
perform best despite their smaller size, suggesting that domain matching had a
larger effect than network architecture. We achieve a F1 score above 90% on
five of the targeted categories and over 80% on four further categories.
Related papers
- LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain.
LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP)
We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case
Encoding [15.685369142294693]
CaseEncoder is a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases.
CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.
arXiv Detail & Related papers (2023-05-09T12:40:19Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Attentive Deep Neural Networks for Legal Document Retrieval [2.4350217735794337]
We study the use of attentive neural network-based text representation for statute law document retrieval.
We develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer.
Experimental results show that Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages.
arXiv Detail & Related papers (2022-12-13T01:37:27Z) - LawngNLI: A Long-Premise Benchmark for In-Domain Generalization from
Short to Long Contexts and for Implication-Based Retrieval [72.4859717204905]
LawngNLI is constructed from U.S. legal opinions with automatic labels with high human-validated accuracy.
It can benchmark for in-domain generalization from short to long contexts.
LawngNLI can train and test systems for implication-based case retrieval and argumentation.
arXiv Detail & Related papers (2022-12-06T18:42:39Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - When Does Pretraining Help? Assessing Self-Supervised Learning for Law
and the CaseHOLD Dataset [2.0924876102146714]
We present a new dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case.
We show that domain pretraining may be warranted when the task exhibits sufficient similarity to the pretraining corpus.
Our findings inform when researchers should engage resource-intensive pretraining and show that Transformer-based architectures, too, learn embeddings suggestive of distinct legal language.
arXiv Detail & Related papers (2021-04-18T00:57:16Z) - Legal Document Classification: An Application to Law Area Prediction of
Petitions to Public Prosecution Service [6.696983725360808]
This paper proposes the use of NLP techniques for textual classification.
Our main goal is to automate the process of assigning petitions to their respective areas of law.
The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network architecture.
arXiv Detail & Related papers (2020-10-13T18:05:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.