Related papers: Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

URL: http://arxiv.org/abs/2112.07882v1
Date: Wed, 15 Dec 2021 04:53:13 GMT
Title: Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains
Authors: Jaromir Savelka, Hannes Westermann, Karim Benyekhlef, Charlotte S. Alexander, Jayla C. Grant, David Restrepo Amariles, Rajaa El Hamdani, S\'ebastien Mee\`us, Micha{\l} Araszkiewicz, Kevin D. Ashley, Alexandra Ashley, Karl Branting, Mattia Falduti, Matthias Grabmair, Jakub Hara\v{s}ta, Tereza Novotn\'a, Elizabeth Tippett, Shiwanni Johnson
Abstract summary: We analyze the use of Language-Agnostic Sentence Representations in sequence labeling models using Gated Recurrent Units (GRUs) that are transferable across languages. We found that models generalize beyond the contexts on which they were trained. We found that training the models on multiple contexts increases robustness and improves overall performance when evaluating on previously unseen contexts.
Score: 40.58709137006848
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we examine the use of multi-lingual sentence embeddings to transfer predictive models for functional segmentation of adjudicatory decisions across jurisdictions, legal systems (common and civil law), languages, and domains (i.e. contexts). Mechanisms for utilizing linguistic resources outside of their original context have significant potential benefits in AI & Law because differences between legal systems, languages, or traditions often block wider adoption of research outcomes. We analyze the use of Language-Agnostic Sentence Representations in sequence labeling models using Gated Recurrent Units (GRUs) that are transferable across languages. To investigate transfer between different contexts we developed an annotation scheme for functional segmentation of adjudicatory decisions. We found that models generalize beyond the contexts on which they were trained (e.g., a model trained on administrative decisions from the US can be applied to criminal law decisions from Italy). Further, we found that training the models on multiple contexts increases robustness and improves overall performance when evaluating on previously unseen contexts. Finally, we found that pooling the training data from all the contexts enhances the models' in-context performance.

Related papers

Aligning Language Models for Icelandic Legal Text Summarization [1.5259290787592112]
This study examines whether preference-based training techniques can enhance models' performance in generating Icelandic legal summaries. Results indicate that preference training improves the legal accuracy of generated summaries over standard fine-tuning but does not significantly enhance the overall quality of Icelandic language usage.
arXiv Detail & Related papers (2025-04-25T08:55:15Z)
MEL: Legal Spanish Language Model [0.3651422140724638]
This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language.
arXiv Detail & Related papers (2025-01-27T12:50:10Z)
Legal Evalutions and Challenges of Large Language Models [42.51294752406578]
We use the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions. We compare current state-of-the-art LLMs, including open-source, closed-source, and legal-specific models trained specifically for the legal domain.
arXiv Detail & Related papers (2024-11-15T12:23:12Z)
A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI [0.0]
South Africa and the DRC present a complex linguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French, English, and Tshiluba. This study develops a multilingual lexicon designed for French and Tshiluba, now expanded to include translations in English, Afrikaans, Sepedi, and Zulu. A comprehensive testing corpus is created to support translation and sentiment analysis tasks, with machine learning models trained to predict sentiment.
arXiv Detail & Related papers (2024-11-06T23:41:18Z)
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges [4.548047308860141]
Natural Language Processing is revolutionizing the way legal professionals and laypersons operate in the legal field. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 148 studies, with a final selection of 127 after manual filtering. It explores foundational concepts related to Natural Language Processing in the legal domain.
arXiv Detail & Related papers (2024-10-25T01:17:02Z)
Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)
Customizing Contextualized Language Models forLegal Document Reviews [0.22940141855172028]
We show how different language models strained on general-domain corpora can be best customized for legal document reviewing tasks. We compare their efficiencies with respect to task performances and present practical considerations.
arXiv Detail & Related papers (2021-02-10T22:14:15Z)
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource. Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering [37.66486350122862]
This paper investigates the performance of natural language understanding approaches on statutory reasoning. We introduce a dataset, together with a legal-domain text corpus. We contrast this with a hand-constructed Prolog-based system, designed to fully solve the task.
arXiv Detail & Related papers (2020-05-11T16:54:42Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.