LegalNLP -- Natural Language Processing methods for the Brazilian Legal
Language
- URL: http://arxiv.org/abs/2110.15709v1
- Date: Tue, 5 Oct 2021 04:44:37 GMT
- Title: LegalNLP -- Natural Language Processing methods for the Brazilian Legal
Language
- Authors: Felipe Maia Polo, Gabriel Caiaffa Floriano Mendon\c{c}a, Kau\^e
Capellato J. Parreira, Lucka Gianvechio, Peterson Cordeiro, Jonathan Batista
Ferreira, Leticia Maria Paz de Lima, Ant\^onio Carlos do Amaral Maia, Renato
Vicente
- Abstract summary: We present and make available pre-trained language models (Phraser, Word2Vec, Doc2Vec, FastText, and BERT) for the Brazilian legal language.
This initiative is extremely helpful for the Brazilian legal field, which lacks other open and specific tools and language models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present and make available pre-trained language models (Phraser, Word2Vec,
Doc2Vec, FastText, and BERT) for the Brazilian legal language, a Python package
with functions to facilitate their use, and a set of demonstrations/tutorials
containing some applications involving them. Given that our material is built
upon legal texts coming from several Brazilian courts, this initiative is
extremely helpful for the Brazilian legal field, which lacks other open and
specific tools and language models. Our main objective is to catalyze the use
of natural language processing tools for legal texts analysis by the Brazilian
industry, government, and academia, providing the necessary tools and
accessible material.
Related papers
- Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges [4.548047308860141]
Natural Language Processing is revolutionizing the way legal professionals and laypersons operate in the legal field.
This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 148 studies, with a final selection of 127 after manual filtering.
It explores foundational concepts related to Natural Language Processing in the legal domain.
arXiv Detail & Related papers (2024-10-25T01:17:02Z) - Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model [1.3812010983144798]
This paper shows that we can leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model.
It can also achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues.
arXiv Detail & Related papers (2024-06-06T16:00:20Z) - CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study [0.0]
We describe a process to automatically generate language exercises and questions from a dependency treebank and a lexical database for Tupian languages.
We conclude that new data gathering processes should be established in partnership with indigenous communities and oriented for educational purposes.
arXiv Detail & Related papers (2024-03-21T16:11:44Z) - One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support [18.810320088441678]
This work introduces a novel NLP benchmark for the legal domain.
It challenges LLMs in five key dimensions: processing emphlong documents (up to 50K tokens), using emphdomain-specific knowledge (embodied in legal texts) and emphmultilingual understanding (covering five languages)
Our benchmark contains diverse datasets from the Swiss legal system, allowing for a comprehensive study of the underlying non-English, inherently multilingual legal system.
arXiv Detail & Related papers (2023-06-15T16:19:15Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text
Comprehension [6.442209435258797]
LegalRelectra is a legal-domain language model trained on mixed-domain legal and medical corpora.
Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator.
arXiv Detail & Related papers (2022-12-16T00:15:14Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.