Related papers: Validity Assessment of Legal Will Statements as Natural Language Inference

Validity Assessment of Legal Will Statements as Natural Language Inference

URL: http://arxiv.org/abs/2210.16989v1
Date: Sun, 30 Oct 2022 23:53:13 GMT
Title: Validity Assessment of Legal Will Statements as Natural Language Inference
Authors: Alice Saebom Kwak, Jacob O. Israelsen, Clayton T. Morrison, Derek E. Bambauer and Mihai Surdeanu
Abstract summary: This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills. This dataset is unique because: (a) each entailment decision requires three inputs: the statement from the will, the law, and the conditions that hold at the time of the testator's death; and (b) the included texts are longer than the ones in current NLI datasets.
Score: 16.292117261545226
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills. This dataset is unique because: (a) each entailment decision requires three inputs: the statement from the will, the law, and the conditions that hold at the time of the testator's death; and (b) the included texts are longer than the ones in current NLI datasets. We trained eight neural NLI models in this dataset. All the models achieve more than 80% macro F1 and accuracy, which indicates that neural approaches can handle this task reasonably well. However, group accuracy, a stricter evaluation measure that is calculated with a group of positive and negative examples generated from the same statement as a unit, is in mid 80s at best, which suggests that the models' understanding of the task remains superficial. Further ablative analyses and explanation experiments indicate that all three text segments are used for prediction, but some decisions rely on semantically irrelevant tokens. This indicates that overfitting on these longer texts likely happens, and that additional research is required for this task to be solved.

Related papers

Improving the Natural Language Inference robustness to hard dataset by data augmentation and preprocessing [1.7487745673871375]
Natural Language Inference (NLI) is the task of inferring whether the hypothesis can be justified by the given premise. We propose the data augmentation and preprocessing methods to solve the word overlap, numerical reasoning and length mismatch problems.
arXiv Detail & Related papers (2024-12-10T01:49:23Z)
Detecting Response Generation Not Requiring Factual Judgment [14.921007421043198]
This study aimed to achieve both attractiveness and factuality in a dialogue response for which a task was set to predict sentences that do not require factual correctness judgment. We created a dataset, dialogue dataset annotated with fact-check-needed label (DDFC), for this task via crowdsourcing, and classification tasks were performed on several models using this dataset. The model with the highest classification accuracy could yield about 88% accurate classification results.
arXiv Detail & Related papers (2024-06-14T04:03:24Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
Analyzing Vietnamese Legal Questions Using Deep Neural Networks with Biaffine Classifiers [3.116035935327534]
We propose using deep neural networks to extract important information from Vietnamese legal questions. Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question.
arXiv Detail & Related papers (2023-04-27T18:19:24Z)
Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers [0.0]
This paper aims to create more flexible model explainability summaries by segments of observation or clusters of words that are semantically related to each other. In addition, we introduce a root cause analysis method for NLP models, by analyzing representative False Positive and False Negative examples from different segments.
arXiv Detail & Related papers (2023-03-06T22:59:02Z)
Retrieval-based Disentangled Representation Learning with Natural Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning. Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z)
Revisiting text decomposition methods for NLI-based factuality scoring of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring. We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z)
ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings. It consists of two steps, data disambiguation and sentence fusion. Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z)
Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples. We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries. We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z)
Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets. interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z)
NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance [3.7024660695776066]
We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI) A large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities.
arXiv Detail & Related papers (2021-04-10T12:28:07Z)
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters. We infer the posteriors over such latent variables based on data from seen task-language combinations. Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.