Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment
- URL: http://arxiv.org/abs/2402.13956v3
- Date: Wed, 17 Jul 2024 17:49:42 GMT
- Title: Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment
- Authors: William Merrill, Zhaofeng Wu, Norihito Naka, Yoon Kim, Tal Linzen,
- Abstract summary: Merrill et al. argue that, in theory, sentence co-occurrence probabilities predicted by an optimal LM should reflect the entailment relationship of the constituent sentences.
We investigate whether their theory can be used to decode entailment relations from neural LMs.
We find that a test similar to theirs can decode entailment relations between natural sentences, well above random chance, though not perfectly.
- Score: 36.82878715850013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Do LMs infer the semantics of text from co-occurrence patterns in their training data? Merrill et al. (2022) argue that, in theory, sentence co-occurrence probabilities predicted by an optimal LM should reflect the entailment relationship of the constituent sentences, but it is unclear whether probabilities predicted by neural LMs encode entailment in this way because of strong assumptions made by Merrill et al. (namely, that humans always avoid redundancy). In this work, we investigate whether their theory can be used to decode entailment relations from neural LMs. We find that a test similar to theirs can decode entailment relations between natural sentences, well above random chance, though not perfectly, across many datasets and LMs. This suggests LMs implicitly model aspects of semantics to predict semantic effects on sentence co-occurrence patterns. However, we find the test that predicts entailment in practice works in the opposite direction to the theoretical test. We thus revisit the assumptions underlying the original test, finding its derivation did not adequately account for redundancy in human-written text. We argue that better accounting for redundancy related to explanations might derive the observed flipped test and, more generally, improve computational models of speakers in linguistics.
Related papers
- QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships.
We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types.
Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z) - Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
We evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility.
We find that LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting.
We conclude that, even in the era of prompt-based evaluations, LogProbs constitute a useful metric of semantic plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z) - Incoherent Probability Judgments in Large Language Models [5.088721610298991]
We assess the coherence of probability judgments made by autoregressive Large Language Models (LLMs)
Our results show that the judgments produced by these models are often incoherent, displaying human-like systematic deviations from the rules of probability theory.
arXiv Detail & Related papers (2024-01-30T00:40:49Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
Correspondence [45.9949173746044]
We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP)
We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence.
We find that the task enables PLMs to learn lexical semantic information.
arXiv Detail & Related papers (2022-05-08T08:37:36Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - The Language Model Understood the Prompt was Ambiguous: Probing
Syntactic Uncertainty Through Generation [23.711953448400514]
We inspect to which extent neural language models (LMs) exhibit uncertainty over such analyses.
We find that LMs can track multiple analyses simultaneously.
As a response to disambiguating cues, the LMs often select the correct interpretation, but occasional errors point to potential areas of improvement.
arXiv Detail & Related papers (2021-09-16T10:27:05Z) - HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in
Natural Language Inference [38.14399396661415]
We derive adversarial examples in terms of the hypothesis-only bias.
We investigate two debiasing approaches which exploit the artificial pattern modeling to mitigate such hypothesis-only bias.
arXiv Detail & Related papers (2020-03-05T16:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.