Language Models are Bounded Pragmatic Speakers: Understanding RLHF from
a Bayesian Cognitive Modeling Perspective
- URL: http://arxiv.org/abs/2305.17760v6
- Date: Mon, 1 Jan 2024 21:06:57 GMT
- Title: Language Models are Bounded Pragmatic Speakers: Understanding RLHF from
a Bayesian Cognitive Modeling Perspective
- Authors: Khanh Nguyen
- Abstract summary: This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker.
We demonstrate that large language models fine-tuned with reinforcement learning from human feedback embody a model of thought that resembles a fast-and-slow model.
- Score: 2.8282906214258805
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: How do language models "think"? This paper formulates a probabilistic
cognitive model called the bounded pragmatic speaker, which can characterize
the operation of different variations of language models. Specifically, we
demonstrate that large language models fine-tuned with reinforcement learning
from human feedback (Ouyang et al., 2022) embody a model of thought that
conceptually resembles a fast-and-slow model (Kahneman, 2011), which
psychologists have attributed to humans. We discuss the limitations of
reinforcement learning from human feedback as a fast-and-slow model of thought
and propose avenues for expanding this framework. In essence, our research
highlights the value of adopting a cognitive probabilistic modeling approach to
gain insights into the comprehension, evaluation, and advancement of language
models.
Related papers
- Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation.
We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge.
Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Human-like Few-Shot Learning via Bayesian Reasoning over Natural
Language [7.11993673836973]
Humans can efficiently learn a broad range of concepts.
We introduce a model of inductive learning that seeks to be human-like in that sense.
arXiv Detail & Related papers (2023-06-05T11:46:45Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Uncovering Constraint-Based Behavior in Neural Models via Targeted
Fine-Tuning [9.391375268580806]
We show that competing linguistic processes within a language obscure underlying linguistic knowledge.
While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior.
Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.
arXiv Detail & Related papers (2021-06-02T14:52:11Z) - Probabilistic Predictions of People Perusing: Evaluating Metrics of
Language Model Performance for Psycholinguistic Modeling [0.8668211481067458]
We re-evaluate a claim due to Goodkind and Bicknell that a language model's ability to model reading times is a linear function of its perplexity.
We show that the proposed relation does not always hold for Long Short-Term Memory networks, Transformers, and pre-trained models.
arXiv Detail & Related papers (2020-09-08T19:12:06Z) - Probing Neural Language Models for Human Tacit Assumptions [36.63841251126978]
Humans carry stereotypic tacit assumptions (STAs) or propositional beliefs about generic concepts.
We construct a diagnostic set of word prediction prompts to evaluate whether recent neural contextualized language models trained on large text corpora capture STAs.
arXiv Detail & Related papers (2020-04-10T01:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.