Natural Language Decompositions of Implicit Content Enable Better Text
Representations
- URL: http://arxiv.org/abs/2305.14583v2
- Date: Wed, 25 Oct 2023 00:08:17 GMT
- Title: Natural Language Decompositions of Implicit Content Enable Better Text
Representations
- Authors: Alexander Hoyle, Rupak Sarkar, Pranav Goel, Philip Resnik
- Abstract summary: We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
- Score: 56.85319224208865
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: When people interpret text, they rely on inferences that go beyond the
observed language itself. Inspired by this observation, we introduce a method
for the analysis of text that takes implicitly communicated content explicitly
into account. We use a large language model to produce sets of propositions
that are inferentially related to the text that has been observed, then
validate the plausibility of the generated content via human judgments.
Incorporating these explicit representations of implicit content proves useful
in multiple problem settings that involve the human interpretation of
utterances: assessing the similarity of arguments, making sense of a body of
opinion data, and modeling legislative behavior. Our results suggest that
modeling the meanings behind observed language, rather than the literal text
alone, is a valuable direction for NLP and particularly its applications to
social science.
Related papers
- Interpretation modeling: Social grounding of sentences by reasoning over
their implicit moral judgments [24.133419857271505]
Single gold-standard interpretations rarely exist, challenging conventional assumptions in natural language processing.
This work introduces the interpretation modeling (IM) task which involves modeling several interpretations of a sentence's underlying semantics.
A first-of-its-kind IM dataset is curated to support experiments and analyses.
arXiv Detail & Related papers (2023-11-27T07:50:55Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Improve Discourse Dependency Parsing with Contextualized Representations [28.916249926065273]
We propose to take advantage of transformers to encode contextualized representations of units of different levels.
Motivated by the observation of writing patterns commonly shared across articles, we propose a novel method that treats discourse relation identification as a sequence labelling task.
arXiv Detail & Related papers (2022-05-04T14:35:38Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - A Dataset for Statutory Reasoning in Tax Law Entailment and Question
Answering [37.66486350122862]
This paper investigates the performance of natural language understanding approaches on statutory reasoning.
We introduce a dataset, together with a legal-domain text corpus.
We contrast this with a hand-constructed Prolog-based system, designed to fully solve the task.
arXiv Detail & Related papers (2020-05-11T16:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.