Analysing similarities between legal court documents using natural
language processing approaches based on Transformers
- URL: http://arxiv.org/abs/2204.07182v3
- Date: Thu, 11 May 2023 08:33:49 GMT
- Title: Analysing similarities between legal court documents using natural
language processing approaches based on Transformers
- Authors: Raphael Souza de Oliveira and Erick Giovani Sperandio Nascimento
- Abstract summary: This work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group.
It applies six NLP techniques based on the transformers architecture to a case study of legal proceedings in the Brazilian judicial system.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advances in Artificial Intelligence (AI) have leveraged promising
results in solving complex problems in the area of Natural Language Processing
(NLP), being an important tool to help in the expeditious resolution of
judicial proceedings in the legal area. In this context, this work targets the
problem of detecting the degree of similarity between judicial documents that
can be achieved in the inference group, by applying six NLP techniques based on
the transformers architecture to a case study of legal proceedings in the
Brazilian judicial system. The NLP transformer-based models, namely BERT, GPT-2
and RoBERTa, were pre-trained using a general purpose corpora of the Brazilian
Portuguese language, and then were fine-tuned and specialised for the legal
sector using 210,000 legal proceedings. Vector representations of each legal
document were calculated based on their embeddings, which were used to cluster
the lawsuits, calculating the quality of each model based on the cosine of the
distance between the elements of the group to its centroid. We noticed that
models based on transformers presented better performance when compared to
previous traditional NLP techniques, with the RoBERTa model specialised for the
Brazilian Portuguese language presenting the best results. This methodology can
be also applied to other case studies for different languages, making it
possible to advance in the current state of the art in the area of NLP applied
to the legal sector.
Related papers
- InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation [5.646219481667151]
This paper introduces a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings.
We introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements.
We also present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system.
arXiv Detail & Related papers (2024-05-17T11:22:27Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment approach to bridge the gap between large language models' English and non-English performance.
Experiment results show that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios.
To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective [43.662441393491584]
In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes.
In this work, we aim to build a structured overview of Legal Tech use cases, grounded in NLP literature, but also supplemented by voices from legal practice in Germany.
arXiv Detail & Related papers (2024-04-29T14:56:47Z) - Transformer-based Entity Legal Form Classification [43.75590166844617]
We propose the application of Transformer-based language models for classifying legal forms.
We employ various BERT variants and compare their performance against multiple traditional baselines.
Our findings demonstrate that pre-trained BERT variants outperform traditional text classification approaches in terms of F1 score.
arXiv Detail & Related papers (2023-10-19T14:11:43Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - An Uncommon Task: Participatory Design in Legal AI [64.54460979588075]
We examine a notable yet understudied AI design process in the legal domain that took place over a decade ago.
We show how an interactive simulation methodology allowed computer scientists and lawyers to become co-designers.
arXiv Detail & Related papers (2022-03-08T15:46:52Z) - Lex Rosetta: Transfer of Predictive Models Across Languages,
Jurisdictions, and Legal Domains [40.58709137006848]
We analyze the use of Language-Agnostic Sentence Representations in sequence labeling models using Gated Recurrent Units (GRUs) that are transferable across languages.
We found that models generalize beyond the contexts on which they were trained.
We found that training the models on multiple contexts increases robustness and improves overall performance when evaluating on previously unseen contexts.
arXiv Detail & Related papers (2021-12-15T04:53:13Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - Predicting Legal Proceedings Status: Approaches Based on Sequential Text
Data [0.0]
This paper develops predictive models to classify Brazilian legal proceedings in three possible classes of status.
We combined several natural language processing (NLP) and machine learning techniques to solve the problem.
Our approaches achieved maximum accuracy of.93 and top average F1 Scores of.89 (macro) and.93 (weighted)
arXiv Detail & Related papers (2020-03-13T19:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.