Related papers: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

URL: http://arxiv.org/abs/2111.13654v1
Date: Fri, 26 Nov 2021 18:33:59 GMT
Title: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs
Authors: Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
Abstract summary: Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful.
Score: 76.6325846350907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks. Our main contributions include: (1) new metrics for evaluating belief-updating methods that focus on the logical consistency of beliefs, (2) a training objective for Sequential, Local, and Generalizing model updates (SLAG) that improves the performance of learned optimizers, and (3) the introduction of the belief graph, which is a new form of interface with language models that shows the interdependencies between model beliefs. Our experiments suggest that models possess belief-like qualities to only a limited extent, but update methods can both fix incorrect model beliefs and greatly improve their consistency. Although off-the-shelf optimizers are surprisingly strong belief-updating baselines, our learned optimizers can outperform them in more difficult settings than have been considered in past work. Code is available at https://github.com/peterbhase/SLAG-Belief-Updating

Related papers

Bayesian Social Deduction with Graph-Informed Language Models [3.7540464038118633]
Social reasoning remains a challenging task for large language models.<n>We introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model.<n>Our approach achieves competitive performance with much larger models in Agent-Agent play.
arXiv Detail & Related papers (2025-06-21T18:45:28Z)
Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning [36.74368293113009]
We propose a method to rectify the belief space by suppressing spurious beliefs while simultaneously enhancing true ones. Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations. We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space.
arXiv Detail & Related papers (2025-02-28T00:57:45Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings. We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z)
Interpretability Needs a New Paradigm [49.134097841837715]
Interpretability is the study of explaining models in understandable terms to humans. At the core of this debate is how each paradigm ensures its explanations are faithful, i.e., true to the model's behavior. This paper's position is that we should think about new paradigms while staying vigilant regarding faithfulness.
arXiv Detail & Related papers (2024-05-08T19:31:06Z)
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization [11.140366256534474]
Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. We propose a novel approach OGEN to improve the OOD GENeralization of finetuned models. Specifically, a class-conditional feature generator is introduced to synthesize OOD features using just the class name of any unknown class.
arXiv Detail & Related papers (2024-01-29T06:57:48Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief [20.60798513220516]
It can be hard to identify what the model actually "believes" about the world, making it susceptible to inconsistent behavior and simple errors. Our approach is to embed a PTLM in a broader system that includes an evolving, symbolic memory of beliefs. We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system.
arXiv Detail & Related papers (2021-09-29T21:04:27Z)
Enriching a Model's Notion of Belief using a Persistent Memory [20.60798513220516]
Pretrained language models (PTLMs) can produce inconsistent answers to questions when probed. It can be hard to identify what the model actually "believes" about the world. Our goal is to reduce this problem, so systems are more globally consistent and accurate in their answers.
arXiv Detail & Related papers (2021-04-16T23:09:11Z)
Are We Ready For Learned Cardinality Estimation? [6.703418426908341]
We show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates) Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size.
arXiv Detail & Related papers (2020-12-12T06:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.