Resolving Legalese: A Multilingual Exploration of Negation Scope
Resolution in Legal Documents
- URL: http://arxiv.org/abs/2309.08695v1
- Date: Fri, 15 Sep 2023 18:38:06 GMT
- Title: Resolving Legalese: A Multilingual Exploration of Negation Scope
Resolution in Legal Documents
- Authors: Ramona Christen, Anastassia Shaitarova, Matthias St\"urmer, Joel
Niklaus
- Abstract summary: complexity of legal texts and lack of annotated in-domain negation corpora pose challenges for state-of-the-art (SotA) models.
Our experiments demonstrate that models pre-trained without legal data underperform in the task of negation scope resolution.
We release a new set of annotated court decisions in German, French, and Italian and use it to improve negation scope resolution in both zero-shot and multilingual settings.
- Score: 3.8467652838774873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Resolving the scope of a negation within a sentence is a challenging NLP
task. The complexity of legal texts and the lack of annotated in-domain
negation corpora pose challenges for state-of-the-art (SotA) models when
performing negation scope resolution on multilingual legal data. Our
experiments demonstrate that models pre-trained without legal data underperform
in the task of negation scope resolution. Our experiments, using language
models exclusively fine-tuned on domains like literary texts and medical data,
yield inferior results compared to the outcomes documented in prior
cross-domain experiments. We release a new set of annotated court decisions in
German, French, and Italian and use it to improve negation scope resolution in
both zero-shot and multilingual settings. We achieve token-level F1-scores of
up to 86.7% in our zero-shot cross-lingual experiments, where the models are
trained on two languages of our legal datasets and evaluated on the third. Our
multilingual experiments, where the models were trained on all available
negation data and evaluated on our legal datasets, resulted in F1-scores of up
to 91.1%.
Related papers
- The Factuality of Large Language Models in the Legal Domain [8.111302195052641]
This paper investigates the factuality of large language models (LLMs) as knowledge bases in the legal domain.
We design a dataset of diverse factual questions about case law and legislation.
We then use the dataset to evaluate several LLMs under different evaluation methods, including exact, alias, and fuzzy matching.
arXiv Detail & Related papers (2024-09-18T08:30:20Z) - AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts [4.427516854041417]
We introduce AGB-DE, a corpus of 3,764 clauses from German consumer contracts that have been annotated and legally assessed by legal experts.
We compare the performance of an SVM baseline with three fine-tuned open language models and the performance of GPT-3.5.
An analysis of the errors indicates that one of the main challenges could be the correct interpretation of complex clauses.
arXiv Detail & Related papers (2024-06-10T21:27:13Z) - Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation [7.242609314791262]
This paper introduces a novel approach to zero-shot cross-lingual stance detection, Multilingual Translation-Augmented BERT (MTAB)
Our technique employs translation augmentation to improve zero-shot performance and pairs it with adversarial learning to further boost model efficacy.
We demonstrate the effectiveness of our proposed approach, showcasing improved results in comparison to a strong baseline model as well as ablated versions of our model.
arXiv Detail & Related papers (2024-04-22T16:56:43Z) - MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset [0.0]
Sentence Boundary Detection (SBD) is one of the foundational building blocks of Natural Language Processing (NLP)
We curated a diverse multilingual legal dataset consisting of over 130'000 annotated sentences in 6 languages.
We trained and tested monolingual and multilingual models based on CRF, BiLSTM-CRF, and transformers, demonstrating state-of-the-art performance.
arXiv Detail & Related papers (2023-05-02T05:52:03Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.