PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition
- URL: http://arxiv.org/abs/2212.10750v2
- Date: Wed, 24 May 2023 23:19:14 GMT
- Title: PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition
- Authors: Sihao Chen and Senaka Buthpitiya and Alex Fabrikant and Dan Roth and
Tal Schuster
- Abstract summary: We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
- Score: 63.51569687229681
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widely studied task of Natural Language Inference (NLI) requires a system
to recognize whether one piece of text is textually entailed by another, i.e.
whether the entirety of its meaning can be inferred from the other. In current
NLI datasets and models, textual entailment relations are typically defined on
the sentence- or paragraph-level. However, even a simple sentence often
contains multiple propositions, i.e. distinct units of meaning conveyed by the
sentence. As these propositions can carry different truth values in the context
of a given premise, we argue for the need to recognize the textual entailment
relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert
human raters. Our dataset structure resembles the tasks of (1) segmenting
sentences within a document to the set of propositions, and (2) classifying the
entailment relation of each proposition with respect to a different yet
topically-aligned document, i.e. documents describing the same event or entity.
We establish strong baselines for the segmentation and entailment tasks.
Through case studies on summary hallucination detection and document-level NLI,
we demonstrate that our conceptual framework is potentially useful for
understanding and explaining the compositionality of NLI labels.
Related papers
- Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with
Weak Supervision on Sentence Classification [91.13086984529706]
Aspect-based meeting transcript summarization aims to produce multiple summaries.
Traditional summarization methods produce one summary mixing information of all aspects.
We propose a two-stage method for aspect-based meeting transcript summarization.
arXiv Detail & Related papers (2023-11-07T19:06:31Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - A Survey of Implicit Discourse Relation Recognition [9.57170901247685]
implicit discourse relation recognition (IDRR) is to detect implicit relation and classify its sense between two text segments without a connective.
This article provides a comprehensive and up-to-date survey for the IDRR task.
arXiv Detail & Related papers (2022-03-06T15:12:53Z) - Coherence-Based Distributed Document Representation Learning for
Scientific Documents [9.646001537050925]
We propose a coupled text pair embedding (CTPE) model to learn the representation of scientific documents.
We use negative sampling to construct uncoupled text pairs whose two parts are from different documents.
We train the model to judge whether the text pair is coupled or uncoupled and use the obtained embedding of coupled text pairs as the embedding of documents.
arXiv Detail & Related papers (2022-01-08T15:29:21Z) - XTE: Explainable Text Entailment [8.036150169408241]
Entailment is the task of determining whether a piece of text logically follows from another piece of text.
XTE - Explainable Text Entailment - is a novel composite approach for recognizing text entailment.
arXiv Detail & Related papers (2020-09-25T20:49:07Z) - Understanding Points of Correspondence between Sentences for Abstractive
Summarization [39.7404761923196]
We present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence.
We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences.
arXiv Detail & Related papers (2020-06-10T02:42:38Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.