Probing neural language models for understanding of words of estimative
probability
- URL: http://arxiv.org/abs/2211.03358v2
- Date: Sun, 25 Jun 2023 11:00:29 GMT
- Title: Probing neural language models for understanding of words of estimative
probability
- Authors: Damien Sileo and Marie-Francine Moens
- Abstract summary: Words of estimative probability (WEP) are expressions of a statement's plausibility.
We measure the ability of neural language processing models to capture the consensual probability level associated to each WEP.
- Score: 21.072862529656287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Words of estimative probability (WEP) are expressions of a statement's
plausibility (probably, maybe, likely, doubt, likely, unlikely, impossible...).
Multiple surveys demonstrate the agreement of human evaluators when assigning
numerical probability levels to WEP. For example, highly likely corresponds to
a median chance of 0.90+-0.08 in Fagen-Ulmschneider (2015)'s survey. In this
work, we measure the ability of neural language processing models to capture
the consensual probability level associated to each WEP. Firstly, we use the
UNLI dataset (Chen et al., 2020) which associates premises and hypotheses with
their perceived joint probability p, to construct prompts, e.g. "[PREMISE].
[WEP], [HYPOTHESIS]." and assess whether language models can predict whether
the WEP consensual probability level is close to p. Secondly, we construct a
dataset of WEP-based probabilistic reasoning, to test whether language models
can reason with WEP compositions. When prompted "[EVENTA] is likely. [EVENTB]
is impossible.", a causal language model should not express that [EVENTA&B] is
likely. We show that both tasks are unsolved by off-the-shelf English language
models, but that fine-tuning leads to transferable improvement.
Related papers
- Perceptions of Linguistic Uncertainty by Language Models and Humans [26.69714008538173]
We investigate how language models map linguistic expressions of uncertainty to numerical responses.
We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner.
This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge.
arXiv Detail & Related papers (2024-07-22T17:26:12Z) - A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood.
We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - An Evaluation of Estimative Uncertainty in Large Language Models [3.04503073434724]
Estimative uncertainty has long been an area of study -- including by intelligence agencies like the CIA.
This study compares estimative uncertainty in commonly used large language models (LLMs) to that of humans, and to each other.
We show that LLMs like GPT-3.5 and GPT-4 align with human estimates for some, but not all, WEPs presented in English.
arXiv Detail & Related papers (2024-05-24T03:39:31Z) - Language Models (Mostly) Know What They Know [10.836210010868932]
We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly.
We investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer.
arXiv Detail & Related papers (2022-07-11T22:59:39Z) - Probabilistic Conformal Prediction Using Conditional Random Samples [73.26753677005331]
PCP is a predictive inference algorithm that estimates a target variable by a discontinuous predictive set.
It is efficient and compatible with either explicit or implicit conditional generative models.
arXiv Detail & Related papers (2022-06-14T03:58:03Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Learning a Word-Level Language Model with Sentence-Level Noise
Contrastive Estimation for Contextual Sentence Probability Estimation [3.1040192682787415]
Inferring the probability distribution of sentences or word sequences is a key process in natural language processing.
While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences, they have difficulty capturing a context long enough for sentence probability estimation (SPE)
Recent studies introduced training methods using sentence-level noise-contrastive estimation (NCE) with recurrent neural networks (RNNs)
We apply our method to a simple word-level RNN LM to focus on the effect of the sentence-level NCE training rather than on the network architecture.
arXiv Detail & Related papers (2021-03-14T09:17:37Z) - L2R2: Leveraging Ranking for Abductive Reasoning [65.40375542988416]
The abductive natural language inference task ($alpha$NLI) is proposed to evaluate the abductive reasoning ability of a learning system.
A novel $L2R2$ approach is proposed under the learning-to-rank framework.
Experiments on the ART dataset reach the state-of-the-art in the public leaderboard.
arXiv Detail & Related papers (2020-05-22T15:01:23Z) - Predicting Performance for Natural Language Processing Tasks [128.34208911925424]
We build regression models to predict the evaluation score of an NLP experiment given the experimental settings as input.
Experimenting on 9 different NLP tasks, we find that our predictors can produce meaningful predictions over unseen languages and different modeling architectures.
arXiv Detail & Related papers (2020-05-02T16:02:18Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.