Decontextualization: Making Sentences Stand-Alone
- URL: http://arxiv.org/abs/2102.05169v1
- Date: Tue, 9 Feb 2021 22:52:37 GMT
- Title: Decontextualization: Making Sentences Stand-Alone
- Authors: Eunsol Choi, Jennimaria Palomaki, Matthew Lamm, Tom Kwiatkowski,
Dipanjan Das, Michael Collins
- Abstract summary: Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context.
Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window.
We define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context.
- Score: 13.465459751619818
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Models for question answering, dialogue agents, and summarization often
interpret the meaning of a sentence in a rich context and use that meaning in a
new context. Taking excerpts of text can be problematic, as key pieces may not
be explicit in a local window. We isolate and define the problem of sentence
decontextualization: taking a sentence together with its context and rewriting
it to be interpretable out of context, while preserving its meaning. We
describe an annotation procedure, collect data on the Wikipedia corpus, and use
the data to train models to automatically decontextualize sentences. We present
preliminary studies that show the value of sentence decontextualization in a
user facing task, and as preprocessing for systems that perform document
understanding. We argue that decontextualization is an important subtask in
many downstream applications, and that the definitions and resources provided
can benefit tasks that operate on sentences that occur in a richer context.
Related papers
- ContextCite: Attributing Model Generation to Context [64.90535024385305]
We introduce the problem of context attribution, pinpointing the parts of the context that led a model to generate a particular statement.
We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model.
We showcase ContextCite through three applications: helping verify generated statements, improving response quality, and detecting poisoning attacks.
arXiv Detail & Related papers (2024-09-01T14:36:36Z) - Explicating the Implicit: Argument Detection Beyond Sentence Boundaries [24.728886446551577]
We reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence boundaries.
Our method does not require direct supervision, which is generally absent due to dataset scarcity.
We demonstrate it on a recent document-level benchmark, outperforming some supervised methods and contemporary language models.
arXiv Detail & Related papers (2024-08-08T06:18:24Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Dense Paraphrasing for Textual Enrichment [7.6233489924270765]
We define the process of rewriting a textual expression (lexeme or phrase) such that it reduces ambiguity while also making explicit the underlying semantics that is not (necessarily) expressed in the economy of sentence structure as Dense Paraphrasing (DP)
We build the first complete DP dataset, provide the scope and design of the annotation task, and present results demonstrating how this DP process can enrich a source text to improve inferencing and QA task performance.
arXiv Detail & Related papers (2022-10-20T19:58:31Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z) - Measuring and Increasing Context Usage in Context-Aware Machine
Translation [64.5726087590283]
We introduce a new metric, conditional cross-mutual information, to quantify the usage of context by machine translation models.
We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models.
arXiv Detail & Related papers (2021-05-07T19:55:35Z) - On the Use of Context for Predicting Citation Worthiness of Sentences in
Scholarly Articles [10.28696219236292]
We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model.
We contribute a new benchmark dataset containing over two million sentences and their corresponding labels.
Our results quantify the benefits of using context and contextual embeddings for citation worthiness.
arXiv Detail & Related papers (2021-04-18T21:47:30Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - XTE: Explainable Text Entailment [8.036150169408241]
Entailment is the task of determining whether a piece of text logically follows from another piece of text.
XTE - Explainable Text Entailment - is a novel composite approach for recognizing text entailment.
arXiv Detail & Related papers (2020-09-25T20:49:07Z) - Understanding Points of Correspondence between Sentences for Abstractive
Summarization [39.7404761923196]
We present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence.
We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences.
arXiv Detail & Related papers (2020-06-10T02:42:38Z) - Context-based Transformer Models for Answer Sentence Selection [109.96739477808134]
In this paper, we analyze the role of the contextual information in the sentence selection task.
We propose a Transformer based architecture that leverages two types of contexts, local and global.
The results show that the combination of local and global contexts in a Transformer model significantly improves the accuracy in Answer Sentence Selection.
arXiv Detail & Related papers (2020-06-01T21:52:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.