Do Language Models Have Beliefs? Methods for Detecting, Updating, and
Visualizing Model Beliefs
- URL: http://arxiv.org/abs/2111.13654v1
- Date: Fri, 26 Nov 2021 18:33:59 GMT
- Title: Do Language Models Have Beliefs? Methods for Detecting, Updating, and
Visualizing Model Beliefs
- Authors: Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva,
Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
- Abstract summary: Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state.
In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful.
- Score: 76.6325846350907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Do language models have beliefs about the world? Dennett (1995) famously
argues that even thermostats have beliefs, on the view that a belief is simply
an informational state decoupled from any motivational state. In this paper, we
discuss approaches to detecting when models have beliefs about the world, and
we improve on methods for updating model beliefs to be more truthful, with a
focus on methods based on learned optimizers or hypernetworks. Our main
contributions include: (1) new metrics for evaluating belief-updating methods
that focus on the logical consistency of beliefs, (2) a training objective for
Sequential, Local, and Generalizing model updates (SLAG) that improves the
performance of learned optimizers, and (3) the introduction of the belief
graph, which is a new form of interface with language models that shows the
interdependencies between model beliefs. Our experiments suggest that models
possess belief-like qualities to only a limited extent, but update methods can
both fix incorrect model beliefs and greatly improve their consistency.
Although off-the-shelf optimizers are surprisingly strong belief-updating
baselines, our learned optimizers can outperform them in more difficult
settings than have been considered in past work. Code is available at
https://github.com/peterbhase/SLAG-Belief-Updating
Related papers
- Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation.
We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge.
Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Interpretability Needs a New Paradigm [49.134097841837715]
Interpretability is the study of explaining models in understandable terms to humans.
At the core of this debate is how each paradigm ensures its explanations are faithful, i.e., true to the model's behavior.
This paper's position is that we should think about new paradigms while staying vigilant regarding faithfulness.
arXiv Detail & Related papers (2024-05-08T19:31:06Z) - Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization [11.140366256534474]
Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks.
We propose a novel approach OGEN to improve the OOD GENeralization of finetuned models.
Specifically, a class-conditional feature generator is introduced to synthesize OOD features using just the class name of any unknown class.
arXiv Detail & Related papers (2024-01-29T06:57:48Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - BeliefBank: Adding Memory to a Pre-Trained Language Model for a
Systematic Notion of Belief [20.60798513220516]
It can be hard to identify what the model actually "believes" about the world, making it susceptible to inconsistent behavior and simple errors.
Our approach is to embed a PTLM in a broader system that includes an evolving, symbolic memory of beliefs.
We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system.
arXiv Detail & Related papers (2021-09-29T21:04:27Z) - Enriching a Model's Notion of Belief using a Persistent Memory [20.60798513220516]
Pretrained language models (PTLMs) can produce inconsistent answers to questions when probed.
It can be hard to identify what the model actually "believes" about the world.
Our goal is to reduce this problem, so systems are more globally consistent and accurate in their answers.
arXiv Detail & Related papers (2021-04-16T23:09:11Z) - Are We Ready For Learned Cardinality Estimation? [6.703418426908341]
We show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs.
Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates)
Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size.
arXiv Detail & Related papers (2020-12-12T06:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.