Related papers: AMR4NLI: Interpretable and robust NLI measures from semantic graphs

AMR4NLI: Interpretable and robust NLI measures from semantic graphs

URL: http://arxiv.org/abs/2306.00936v2
Date: Tue, 5 Sep 2023 13:36:27 GMT
Title: AMR4NLI: Interpretable and robust NLI measures from semantic graphs
Authors: Juri Opitz and Shira Wein and Julius Steen and Anette Frank and Nathan Schneider
Abstract summary: Natural language inference asks whether a given premise entails a given hypothesis. We compare semantic structures to represent premise and hypothesis, including sets of contextualized embeddings and semantic graphs. Our evaluation finds value in both contextualized embeddings and semantic graphs.
Score: 28.017617759762278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The task of natural language inference (NLI) asks whether a given premise (expressed in NL) entails a given NL hypothesis. NLI benchmarks contain human ratings of entailment, but the meaning relationships driving these ratings are not formalized. Can the underlying sentence pair relationships be made more explicit in an interpretable yet robust fashion? We compare semantic structures to represent premise and hypothesis, including sets of contextualized embeddings and semantic graphs (Abstract Meaning Representations), and measure whether the hypothesis is a semantic substructure of the premise, utilizing interpretable metrics. Our evaluation on three English benchmarks finds value in both contextualized embeddings and semantic graphs; moreover, they provide complementary signals, and can be leveraged together in a hybrid model.

Related papers

On Reference (In-)Determinacy in Natural Language Inference [62.904689974282334]
We revisit the reference determinacy (RD) assumption in the task of natural language inference (NLI) We observe that current NLI models fail in downstream applications such as fact verification, where the input premise and hypothesis may refer to different contexts. We introduce RefNLI, a diagnostic benchmark for identifying reference ambiguity in NLI examples.
arXiv Detail & Related papers (2025-02-09T06:58:13Z)
Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis [2.2530496464901106]
We evaluate semantic representations of Spanish ambiguous nouns in context in a suite of Spanish-language monolingual and multilingual BERT-based models. We find that various BERT-based LMs' contextualized semantic representations capture some variance in human judgments but fall short of the human benchmark.
arXiv Detail & Related papers (2024-06-20T18:58:11Z)
Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
We evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility. We find that LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting. We conclude that, even in the era of prompt-based evaluations, LogProbs constitute a useful metric of semantic plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Natural Language Decompositions of Implicit Content Enable Better Text Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z)
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable AMR Meaning Features [22.8438857884398]
We create similarity metrics that are highly effective, while also providing an interpretable rationale for their rating. Our approach works in two steps: We first select AMR graph metrics that measure meaning similarity of sentences with respect to key semantic facets. Second, we employ these metrics to induce Semantically Structured Sentence BERT embeddings, which are composed of different meaning aspects captured in different sub-spaces.
arXiv Detail & Related papers (2022-06-14T17:37:18Z)
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP [44.08113828762984]
This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension. We provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability.
arXiv Detail & Related papers (2022-05-23T07:37:04Z)
Exploring Lexical Irregularities in Hypothesis-Only Models of Natural Language Inference [5.283529004179579]
Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE) is the task of predicting the entailment relation between a pair of sentences. Models that understand entailment should encode both, the premise and the hypothesis. Experiments by Poliak et al. revealed a strong preference of these models towards patterns observed only in the hypothesis.
arXiv Detail & Related papers (2021-01-19T01:08:06Z)
Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales. We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.