Meanings and Feelings of Large Language Models: Observability of Latent States in Generative AI
- URL: http://arxiv.org/abs/2405.14061v1
- Date: Wed, 22 May 2024 23:18:58 GMT
- Title: Meanings and Feelings of Large Language Models: Observability of Latent States in Generative AI
- Authors: Tian Yu Liu, Stefano Soatto, Matteo Marchi, Pratik Chaudhari, Paulo Tabuada,
- Abstract summary: We show that current Large Language Models (LLMs) cannot have 'feelings' according to the American Psychological Association (APA)
Our analysis sheds light on possible designs that would enable a model to perform non-trivial computation that is not visible to the user.
- Score: 65.04274914674771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the question of whether Large Language Models (LLMs), viewed as dynamical systems with state evolving in the embedding space of symbolic tokens, are observable. That is, whether there exist multiple 'mental' state trajectories that yield the same sequence of generated tokens, or sequences that belong to the same Nerode equivalence class ('meaning'). If not observable, mental state trajectories ('experiences') evoked by an input ('perception') or by feedback from the model's own state ('thoughts') could remain self-contained and evolve unbeknown to the user while being potentially accessible to the model provider. Such "self-contained experiences evoked by perception or thought" are akin to what the American Psychological Association (APA) defines as 'feelings'. Beyond the lexical curiosity, we show that current LLMs implemented by autoregressive Transformers cannot have 'feelings' according to this definition: The set of state trajectories indistinguishable from the tokenized output is a singleton. But if there are 'system prompts' not visible to the user, then the set of indistinguishable trajectories becomes non-trivial, and there can be multiple state trajectories that yield the same verbalized output. We prove these claims analytically, and show examples of modifications to standard LLMs that engender such 'feelings.' Our analysis sheds light on possible designs that would enable a model to perform non-trivial computation that is not visible to the user, as well as on controls that the provider of services using the model could take to prevent unintended behavior.
Related papers
- Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct [0.0]
We find that the Llama3-8b-Instruct chat model can reliably distinguish its own outputs from those of humans.
We identify a vector in the residual stream of the model that is differentially activated when the model makes a correct self-written-text recognition judgment.
We show that the vector can be used to control both the model's behavior and its perception.
arXiv Detail & Related papers (2024-10-02T22:26:21Z) - States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly [72.24742240125369]
In this paper, we uncover the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions.
Remarkably, the most advanced models can directly output the results of two-digit number additions with lengths extending up to 15 addends.
arXiv Detail & Related papers (2024-07-16T06:27:22Z) - Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text.
This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model.
We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z) - Why Can GPT Learn In-Context? Language Models Implicitly Perform
Gradient Descent as Meta-Optimizers [93.9369467909176]
We explain language models as meta-optimizers and understand in-context learning as implicit finetuning.
We show that in-context learning behaves similarly to explicit finetuning from multiple perspectives.
The improved performance over vanilla attention further supports our understanding from another perspective.
arXiv Detail & Related papers (2022-12-20T18:58:48Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Extracting Latent Steering Vectors from Pretrained Language Models [14.77762401765532]
We show that latent vectors can be extracted directly from language model decoders without fine-tuning.
Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly.
We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark.
arXiv Detail & Related papers (2022-05-10T19:04:37Z) - Provable Limitations of Acquiring Meaning from Ungrounded Form: What
will Future Language Models Understand? [87.20342701232869]
We investigate the abilities of ungrounded systems to acquire meaning.
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
We find that assertions enable semantic emulation if all expressions in the language are referentially transparent.
However, if the language uses non-transparent patterns like variable binding, we show that emulation can become an uncomputable problem.
arXiv Detail & Related papers (2021-04-22T01:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.