Related papers: Analysing similarities between legal court documents using natural language processing approaches based on Transformers

Analysing similarities between legal court documents using natural language processing approaches based on Transformers

URL: http://arxiv.org/abs/2204.07182v3
Date: Thu, 11 May 2023 08:33:49 GMT
Title: Analysing similarities between legal court documents using natural language processing approaches based on Transformers
Authors: Raphael Souza de Oliveira and Erick Giovani Sperandio Nascimento
Abstract summary: This work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group. It applies six NLP techniques based on the transformers architecture to a case study of legal proceedings in the Brazilian judicial system.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in Artificial Intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on the transformers architecture to a case study of legal proceedings in the Brazilian judicial system. The NLP transformer-based models, namely BERT, GPT-2 and RoBERTa, were pre-trained using a general purpose corpora of the Brazilian Portuguese language, and then were fine-tuned and specialised for the legal sector using 210,000 legal proceedings. Vector representations of each legal document were calculated based on their embeddings, which were used to cluster the lawsuits, calculating the quality of each model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers presented better performance when compared to previous traditional NLP techniques, with the RoBERTa model specialised for the Brazilian Portuguese language presenting the best results. This methodology can be also applied to other case studies for different languages, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.

Related papers

Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges [4.548047308860141]
Natural Language Processing is revolutionizing the way legal professionals and laypersons operate in the legal field. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 148 studies, with a final selection of 127 after manual filtering. It explores foundational concepts related to Natural Language Processing in the legal domain.
arXiv Detail & Related papers (2024-10-25T01:17:02Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z)
Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation [5.646219481667151]
This paper introduces a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. We introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. We also present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system.
arXiv Detail & Related papers (2024-05-17T11:22:27Z)
Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective [43.662441393491584]
In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. In this work, we aim to build a structured overview of Legal Tech use cases, grounded in NLP literature, but also supplemented by voices from legal practice in Germany.
arXiv Detail & Related papers (2024-04-29T14:56:47Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Application of Transformers based methods in Electronic Medical Records: A Systematic Literature Review [77.34726150561087]
This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks.
arXiv Detail & Related papers (2023-04-05T22:19:42Z)
An Uncommon Task: Participatory Design in Legal AI [64.54460979588075]
We examine a notable yet understudied AI design process in the legal domain that took place over a decade ago. We show how an interactive simulation methodology allowed computer scientists and lawyers to become co-designers.
arXiv Detail & Related papers (2022-03-08T15:46:52Z)
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
Predicting Legal Proceedings Status: Approaches Based on Sequential Text Data [0.0]
This paper develops predictive models to classify Brazilian legal proceedings in three possible classes of status. We combined several natural language processing (NLP) and machine learning techniques to solve the problem. Our approaches achieved maximum accuracy of.93 and top average F1 Scores of.89 (macro) and.93 (weighted)
arXiv Detail & Related papers (2020-03-13T19:40:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.