Validity Assessment of Legal Will Statements as Natural Language
Inference
- URL: http://arxiv.org/abs/2210.16989v1
- Date: Sun, 30 Oct 2022 23:53:13 GMT
- Title: Validity Assessment of Legal Will Statements as Natural Language
Inference
- Authors: Alice Saebom Kwak, Jacob O. Israelsen, Clayton T. Morrison, Derek E.
Bambauer and Mihai Surdeanu
- Abstract summary: This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills.
This dataset is unique because: (a) each entailment decision requires three inputs: the statement from the will, the law, and the conditions that hold at the time of the testator's death; and (b) the included texts are longer than the ones in current NLI datasets.
- Score: 16.292117261545226
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work introduces a natural language inference (NLI) dataset that focuses
on the validity of statements in legal wills. This dataset is unique because:
(a) each entailment decision requires three inputs: the statement from the
will, the law, and the conditions that hold at the time of the testator's
death; and (b) the included texts are longer than the ones in current NLI
datasets. We trained eight neural NLI models in this dataset. All the models
achieve more than 80% macro F1 and accuracy, which indicates that neural
approaches can handle this task reasonably well. However, group accuracy, a
stricter evaluation measure that is calculated with a group of positive and
negative examples generated from the same statement as a unit, is in mid 80s at
best, which suggests that the models' understanding of the task remains
superficial. Further ablative analyses and explanation experiments indicate
that all three text segments are used for prediction, but some decisions rely
on semantically irrelevant tokens. This indicates that overfitting on these
longer texts likely happens, and that additional research is required for this
task to be solved.
Related papers
- Detecting Response Generation Not Requiring Factual Judgment [14.921007421043198]
This study aimed to achieve both attractiveness and factuality in a dialogue response for which a task was set to predict sentences that do not require factual correctness judgment.
We created a dataset, dialogue dataset annotated with fact-check-needed label (DDFC), for this task via crowdsourcing, and classification tasks were performed on several models using this dataset.
The model with the highest classification accuracy could yield about 88% accurate classification results.
arXiv Detail & Related papers (2024-06-14T04:03:24Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Analyzing Vietnamese Legal Questions Using Deep Neural Networks with
Biaffine Classifiers [3.116035935327534]
We propose using deep neural networks to extract important information from Vietnamese legal questions.
Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question.
arXiv Detail & Related papers (2023-04-27T18:19:24Z) - Multi-resolution Interpretation and Diagnostics Tool for Natural
Language Classifiers [0.0]
This paper aims to create more flexible model explainability summaries by segments of observation or clusters of words that are semantically related to each other.
In addition, we introduce a root cause analysis method for NLP models, by analyzing representative False Positive and False Negative examples from different segments.
arXiv Detail & Related papers (2023-03-06T22:59:02Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings.
It consists of two steps, data disambiguation and sentence fusion.
Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model
Performance [3.7024660695776066]
We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities.
We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI)
A large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities.
arXiv Detail & Related papers (2021-04-10T12:28:07Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.