Truth-value judgment in language models: belief directions are context sensitive
- URL: http://arxiv.org/abs/2404.18865v1
- Date: Mon, 29 Apr 2024 16:52:57 GMT
- Title: Truth-value judgment in language models: belief directions are context sensitive
- Authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen,
- Abstract summary: latent spaces of large language models contain directions predictive of the truth of sentences.
Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs"
We investigate this phenomenon, looking closely at the impact of context on the probes.
- Score: 2.324913904215885
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe's predictions can be described as being conditional on the preceding (related) sentences. Specifically, we quantify the responsiveness of the probes to the presence of (negated) supporting and contradicting sentences, and score the probes on their consistency. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these belief directions influences the position of the hypothesis along that same direction. We find that the probes we test are generally context sensitive, but that contexts which should not affect the truth often still impact the probe outputs. Our experiments show that the type of errors depend on the layer, the (type of) model, and the kind of data. Finally, our results suggest that belief directions are (one of the) causal mediators in the inference process that incorporates in-context information.
Related papers
- Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models [29.67884478799914]
Large Language Models (LLMs) are capable of generating persuasive Natural Language Explanations (NLEs) to justify their answers.
Recent studies have proposed various methods to measure the faithfulness of NLEs, typically by inserting perturbations at the explanation or feature level.
We argue that these approaches are neither comprehensive nor correctly designed according to the established definition of faithfulness.
arXiv Detail & Related papers (2024-10-18T03:45:42Z) - How Entangled is Factuality and Deception in German? [10.790059579736276]
Research on deception detection and fact checking often conflates factual accuracy with the truthfulness of statements.
The belief-based deception framework disentangles these properties by defining texts as deceptive when there is a mismatch between what people say and what they truly believe.
We test the effectiveness of computational models in detecting deception using an established corpus of belief-based argumentation.
arXiv Detail & Related papers (2024-09-30T10:23:13Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Is Probing All You Need? Indicator Tasks as an Alternative to Probing
Embedding Spaces [19.4968960182412]
We introduce the term indicator tasks for non-trainable tasks which are used to query embedding spaces for the existence of certain properties.
We show that the application of a suitable indicator provides a more accurate picture of the information captured and removed compared to probes.
arXiv Detail & Related papers (2023-10-24T15:08:12Z) - Uncertain Evidence in Probabilistic Models and Stochastic Simulators [80.40110074847527]
We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as uncertain evidence'
We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables.
We devise concrete guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency.
arXiv Detail & Related papers (2022-10-21T20:32:59Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
Correspondence [45.9949173746044]
We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP)
We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence.
We find that the task enables PLMs to learn lexical semantic information.
arXiv Detail & Related papers (2022-05-08T08:37:36Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z) - Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [53.484562601127195]
We point out the inability to infer behavioral conclusions from probing results.
We offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
arXiv Detail & Related papers (2020-06-01T15:00:11Z) - CausalVAE: Structured Causal Disentanglement in Variational Autoencoder [52.139696854386976]
The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations.
We propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent factors into causal endogenous ones.
Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy.
arXiv Detail & Related papers (2020-04-18T20:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.