AraLegal-BERT: A pretrained language model for Arabic Legal text
- URL: http://arxiv.org/abs/2210.08284v1
- Date: Sat, 15 Oct 2022 13:08:40 GMT
- Title: AraLegal-BERT: A pretrained language model for Arabic Legal text
- Authors: Muhammad AL-Qurishi and Sarah AlQaseemi and Riad Soussi
- Abstract summary: We introduce AraLegal-BERT, a bidirectional encoder Transformer-based model that have been thoroughly tested and carefully optimized.
We fine-tuned AraLegal-BERT and evaluated it against three BERT variations for Arabic language in three natural languages understanding (NLU) tasks.
The results show that the base version of AraLegal-BERT achieve better accuracy than the general and original BERT over the Legal text.
- Score: 0.399013650624183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The effectiveness of the BERT model on multiple linguistic tasks has been
well documented. On the other hand, its potentials for narrow and specific
domains such as Legal, have not been fully explored. In this paper, we examine
how BERT can be used in the Arabic legal domain and try customizing this
language model for several downstream tasks using several different
domain-relevant training and testing datasets to train BERT from scratch. We
introduce the AraLegal-BERT, a bidirectional encoder Transformer-based model
that have been thoroughly tested and carefully optimized with the goal to
amplify the impact of NLP-driven solution concerning jurisprudence, legal
documents, and legal practice. We fine-tuned AraLegal-BERT and evaluated it
against three BERT variations for Arabic language in three natural languages
understanding (NLU) tasks. The results show that the base version of
AraLegal-BERT achieve better accuracy than the general and original BERT over
the Legal text.
Related papers
- MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset [0.0]
Sentence Boundary Detection (SBD) is one of the foundational building blocks of Natural Language Processing (NLP)
We curated a diverse multilingual legal dataset consisting of over 130'000 annotated sentences in 6 languages.
We trained and tested monolingual and multilingual models based on CRF, BiLSTM-CRF, and transformers, demonstrating state-of-the-art performance.
arXiv Detail & Related papers (2023-05-02T05:52:03Z) - German BERT Model for Legal Named Entity Recognition [0.43461794560295636]
We fine-tune a popular BERT language model trained on German data (German BERT) on a Legal Entity Recognition (LER) dataset.
The results we achieve by fine-tuning German BERT on the LER dataset outperform the BiLSTM-CRF+ model used by the authors of the same LER dataset.
arXiv Detail & Related papers (2023-03-07T11:54:39Z) - Unsupervised Law Article Mining based on Deep Pre-Trained Language
Representation Models with Application to the Italian Civil Code [3.9342247746757435]
This study proposes an advanced approach to law article prediction for the Italian legal system based on a BERT (Bidirectional Representations from Transformers) learning framework.
We define LamBERTa models by fine-tuning an Italian pre-trained BERT on the Italian civil code or its portions, for law article retrieval as a classification task.
We provide insights into the explainability and interpretability of our LamBERTa models, and we present an extensive experimental analysis over query sets of different type.
arXiv Detail & Related papers (2021-12-02T11:02:00Z) - JuriBERT: A Masked-Language Model Adaptation for French Legal Text [14.330469316695853]
We focus on creating a language model adapted to French legal text with the goal of helping law professionals.
We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data.
We release JuriBERT, a new set of BERT models adapted to the French legal domain.
arXiv Detail & Related papers (2021-10-04T14:51:24Z) - Learning Domain-Specialised Representations for Cross-Lingual Biomedical
Entity Linking [66.76141128555099]
We propose a novel cross-lingual biomedical entity linking task (XL-BEL)
We first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task.
We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones.
arXiv Detail & Related papers (2021-05-30T00:50:00Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - Comparing the Performance of NLP Toolkits and Evaluation measures in
Legal Tech [0.0]
We compare and analyze the pretrained Neural Language Models, XLNet (autoregressive), and BERT (autoencoder) on the Legal Tasks.
XLNet Model performs better on our Sequence Classification task of Legal Opinions Classification, whereas BERT produces better results on the NER task.
We use domain-specific pretraining and additional legal vocabulary to adapt BERT Model further to the Legal Domain.
arXiv Detail & Related papers (2021-03-12T11:06:32Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - LEGAL-BERT: The Muppets straight out of Law School [52.53830441117363]
We explore approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets.
Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain.
We release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.
arXiv Detail & Related papers (2020-10-06T09:06:07Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.