Related papers: MetaOthello: A Controlled Study of Multiple World Models in Transformers

MetaOthello: A Controlled Study of Multiple World Models in Transformers

URL: http://arxiv.org/abs/2602.23164v1
Date: Thu, 26 Feb 2026 16:28:09 GMT
Title: MetaOthello: A Controlled Study of Multiple World Models in Transformers
Authors: Aviral Chawla, Galen Hall, Juniper Lovato,
Abstract summary: Previous experiments on Othello playing neural-networks test world-model learning but focus on a single game with a single set of rules.<n>We introduce MetaOthello, a controlled suite of Othello variants with shared syntax but different rules or tokenizations.<n>We find that transformers trained on mixed-game data do not partition their capacity into isolated sub-models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models must handle multiple generative processes, yet mechanistic interpretability largely studies capabilities in isolation; it remains unclear how a single transformer organizes multiple, potentially conflicting "world models". Previous experiments on Othello playing neural-networks test world-model learning but focus on a single game with a single set of rules. We introduce MetaOthello, a controlled suite of Othello variants with shared syntax but different rules or tokenizations, and train small GPTs on mixed-variant data to study how multiple world models are organized in a shared representation space. We find that transformers trained on mixed-game data do not partition their capacity into isolated sub-models; instead, they converge on a mostly shared board-state representation that transfers causally across variants. Linear probes trained on one variant can intervene on another's internal state with effectiveness approaching that of matched probes. For isomorphic games with token remapping, representations are equivalent up to a single orthogonal rotation that generalizes across layers. When rules partially overlap, early layers maintain game-agnostic representations while a middle layer identifies game identity, and later layers specialize. MetaOthello offers a path toward understanding not just whether transformers learn world models, but how they organize many at once.

Related papers

What if Othello-Playing Language Models Could See? [69.77773423053199]
We introduce VISOTHELLO, a multi-modal model trained jointly on move sequences and board images.<n>We evaluate robustness under semantically irrelevant perturbations and analyze the consistency of cross-modal alignment.
arXiv Detail & Related papers (2025-07-19T07:47:55Z)
How GPT learns layer by layer [0.28926166547031595]
We analyze OthelloGPT, a GPT-based model trained on Othello gameplay, as a testbed for studying representation learning.<n>We compare Sparse Autoencoders (SAEs) with linear probes, finding that SAEs offer more robust, disentangled insights into compositional features.<n>We use SAEs to decode features related to tile color and tile stability, a previously unexamined feature that reflects complex gameplay concepts.
arXiv Detail & Related papers (2025-01-13T07:42:55Z)
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention [36.737750120893516]
We propose Joint/Attention (JoMA) dynamics, a novel mathematical framework to understand the training procedure of multilayer Transformers. JoMA predicts that the attention first becomes sparse (to learn salient tokens), then dense (to learn less salient tokens) in the presence of nonlinear activations. We leverage JoMA to explain how tokens are combined to form hierarchies in multilayer Transformers, when the input tokens are generated by a latent hierarchical generative model.
arXiv Detail & Related papers (2023-10-01T01:21:35Z)
Instruction-Following Agents with Multimodal Transformer [95.70039658112873]
We propose a simple yet effective model for robots to solve instruction-following tasks in vision-based environments. Our method consists of a multimodal transformer that encodes visual observations and language instructions. We show that this unified transformer model outperforms all state-of-the-art pre-trained or trained-from-scratch methods in both single-task and multi-task settings.
arXiv Detail & Related papers (2022-10-24T17:46:47Z)
Classical Sequence Match is a Competitive Few-Shot One-Class Learner [15.598750267663286]
We investigate the few-shot one-class problem, which actually takes a known sample as a reference to detect whether an unknown instance belongs to the same class. It is shown that with meta-learning, the classical sequence match method, i.e. Compare-Aggregate, significantly outperforms transformer ones.
arXiv Detail & Related papers (2022-09-14T03:21:47Z)
Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding. It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z)
Multi-Game Decision Transformers [49.257185338595434]
We show that a single transformer-based model can play a suite of up to 46 Atari games simultaneously at close-to-human performance. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning. We find that our Multi-Game Decision Transformer models offer the best scalability and performance.
arXiv Detail & Related papers (2022-05-30T16:55:38Z)
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z)
AAformer: Auto-Aligned Transformer for Person Re-Identification [82.45385078624301]
We introduce an alignment scheme in transformer architecture for the first time. We propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level. AAformer integrates the part alignment into the self-attention and the output [PART]s can be directly used as part features for retrieval.
arXiv Detail & Related papers (2021-04-02T08:00:25Z)
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation [115.4071729927011]
We study the effects of using mid-level visual representations as generic and easy-to-decode perceptual state in an end-to-end RL framework. We show that they aid generalization, improve sample complexity, and lead to a higher final performance. In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed.
arXiv Detail & Related papers (2020-11-13T00:16:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.