Linear Latent World Models in Simple Transformers: A Case Study on
Othello-GPT
- URL: http://arxiv.org/abs/2310.07582v2
- Date: Thu, 12 Oct 2023 18:14:40 GMT
- Title: Linear Latent World Models in Simple Transformers: A Case Study on
Othello-GPT
- Authors: Dean S. Hazineh, Zechen Zhang, Jeffery Chiu
- Abstract summary: This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT.
The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process.
- Score: 0.9208007322096532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models exhibit significant capabilities in decision-making and
logical deductions. Nonetheless, a continuing discourse persists regarding
their genuine understanding of the world as opposed to mere stochastic mimicry.
This paper meticulously examines a simple transformer trained for Othello,
extending prior research to enhance comprehension of the emergent world model
of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a
linear representation of opposing pieces, a factor that causally steers its
decision-making process. This paper further elucidates the interplay between
the linear world representation and causal decision-making, and their
dependence on layer depth and model complexity. We have made the code public.
Related papers
- Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [93.58897637077001]
This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints.
We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos.
For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
arXiv Detail & Related papers (2025-03-11T13:50:22Z) - Revisiting the Othello World Model Hypothesis [46.84113324750507]
We analyze Othello board states and train a model to predict the next move based on previous moves.
We find that all models achieve up to 99% accuracy in unsupervised grounding and exhibit high similarity in the board features they learned.
arXiv Detail & Related papers (2025-03-06T13:26:58Z) - How GPT learns layer by layer [0.28926166547031595]
We analyze OthelloGPT, a GPT-based model trained on Othello gameplay, as a testbed for studying representation learning.
We compare Sparse Autoencoders (SAEs) with linear probes, finding that SAEs offer more robust, disentangled insights into compositional features.
We use SAEs to decode features related to tile color and tile stability, a previously unexamined feature that reflects complex gameplay concepts.
arXiv Detail & Related papers (2025-01-13T07:42:55Z) - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.15395503285804]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI)
In this survey, we embark on a comprehensive exploration of the latest advancements in world models.
We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z) - CAGE: Causality-Aware Shapley Value for Global Explanations [4.017708359820078]
One way to explain AI models is to elucidate the predictive importance of the input features for the AI model.
Inspired by cooperative game theory, Shapley values offer a convenient way for quantifying the feature importance as explanations.
In particular, we introduce a novel sampling procedure for out-coalition features that respects the causal relations of the input features.
arXiv Detail & Related papers (2024-04-17T09:43:54Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Towards an Understanding of Stepwise Inference in Transformers: A
Synthetic Graph Navigation Model [19.826983068662106]
We propose to study autoregressive Transformer models on a synthetic task that embodies the multi-step nature of problems where stepwise inference is generally most useful.
Despite is simplicity, we find we can empirically reproduce and analyze several phenomena observed at scale.
arXiv Detail & Related papers (2024-02-12T16:25:47Z) - AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers [14.147646140595649]
Large Language Models are prone to biased predictions and hallucinations.
achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge.
arXiv Detail & Related papers (2024-02-08T12:01:24Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Motif-guided Time Series Counterfactual Explanations [1.1510009152620664]
We propose a novel model that generates intuitive post-hoc counterfactual explanations.
We validated our model using five real-world time-series datasets from the UCR repository.
arXiv Detail & Related papers (2022-11-08T17:56:50Z) - Coalescing Global and Local Information for Procedural Text
Understanding [70.10291759879887]
A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs.
In this paper, we propose Coalescing Global and Local InformationCG, a new model that builds entity and time representations.
Experiments on a popular procedural text understanding dataset show that our model achieves state-of-the-art results.
arXiv Detail & Related papers (2022-08-26T19:16:32Z) - Abstract Interpretation for Generalized Heuristic Search in Model-Based
Planning [50.96320003643406]
Domain-general model-based planners often derive their generality by constructing searchs through the relaxation of symbolic world models.
We illustrate how abstract interpretation can serve as a unifying framework for these abstractions, extending the reach of search to richer world models.
Theses can also be integrated with learning, allowing agents to jumpstart planning in novel world models via abstraction-derived information.
arXiv Detail & Related papers (2022-08-05T00:22:11Z) - Structural Causal Models Reveal Confounder Bias in Linear Program
Modelling [26.173103098250678]
We investigate the question of whether the phenomenon might be more general in nature, that is, adversarial-style attacks outside classical classification tasks.
Specifically, we consider the base class of Linear Programs (LPs)
We show the direct influence of the Structural Causal Model (SCM) onto the subsequent LP optimization, which ultimately exposes a notion of confounding in LPs.
arXiv Detail & Related papers (2021-05-26T17:19:22Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.