Related papers: Revisiting the Othello World Model Hypothesis

Revisiting the Othello World Model Hypothesis

URL: http://arxiv.org/abs/2503.04421v1
Date: Thu, 06 Mar 2025 13:26:58 GMT
Title: Revisiting the Othello World Model Hypothesis
Authors: Yifei Yuan, Anders Søgaard,
Abstract summary: We analyze Othello board states and train a model to predict the next move based on previous moves.<n>We find that all models achieve up to 99% accuracy in unsupervised grounding and exhibit high similarity in the board features they learned.
Score: 46.84113324750507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Li et al. (2023) used the Othello board game as a test case for the ability of GPT-2 to induce world models, and were followed up by Nanda et al. (2023b). We briefly discuss the original experiments, expanding them to include more language models with more comprehensive probing. Specifically, we analyze sequences of Othello board states and train the model to predict the next move based on previous moves. We evaluate seven language models (GPT-2, T5, Bart, Flan-T5, Mistral, LLaMA-2, and Qwen2.5) on the Othello task and conclude that these models not only learn to play Othello, but also induce the Othello board layout. We find that all models achieve up to 99% accuracy in unsupervised grounding and exhibit high similarity in the board features they learned. This provides considerably stronger evidence for the Othello World Model Hypothesis than previous works.

Related papers

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning [231.11339402237903]
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA. It demonstrates excellent reasoning abilities in STEM and coding.
arXiv Detail & Related papers (2025-04-10T17:10:51Z)
How GPT learns layer by layer [0.28926166547031595]
We analyze OthelloGPT, a GPT-based model trained on Othello gameplay, as a testbed for studying representation learning. We compare Sparse Autoencoders (SAEs) with linear probes, finding that SAEs offer more robust, disentangled insights into compositional features. We use SAEs to decode features related to tile color and tile stability, a previously unexamined feature that reflects complex gameplay concepts.
arXiv Detail & Related papers (2025-01-13T07:42:55Z)
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models [0.0]
We train a GPT model on Othello games and find that the model learned an internal representation of the board state. We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations. Unlike Li et al.'s prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character.
arXiv Detail & Related papers (2024-03-21T18:53:23Z)
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT [0.9208007322096532]
This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process.
arXiv Detail & Related papers (2023-10-11T15:20:07Z)
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task [75.35278593566068]
Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello.
arXiv Detail & Related papers (2022-10-24T16:29:55Z)
Word Play for Playing Othello (Reverses) [0.0]
Research applies both the larger (GPT-3) and smaller (GPT-2) language models to explore the complex strategies for the game of Othello (or Reverses) The language model automatically captures or emulates championship-level strategies. The fine-tuned GPT-2 model generates Othello games ranging from 13-71% completion, while the larger GPT-3 model reaches 41% of a complete game.
arXiv Detail & Related papers (2022-07-18T17:13:32Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
To what extent do human explanations of model behavior align with actual model behavior? [91.67905128825402]
We investigated the extent to which human-generated explanations of models' inference decisions align with how models actually make these decisions. We defined two alignment metrics that quantify how well natural language human explanations align with model sensitivity to input words. We find that a model's alignment with human explanations is not predicted by the model's accuracy on NLI.
arXiv Detail & Related papers (2020-12-24T17:40:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.