Related papers: Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

URL: http://arxiv.org/abs/2210.13382v5
Date: Wed, 26 Jun 2024 14:27:49 GMT
Title: Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Authors: Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg,
Abstract summary: Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello.
Score: 75.35278593566068
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network and create "latent saliency maps" that can help explain predictions in human terms.

Related papers

Aligned explanations in neural networks [0.8594140167290095]
We argue that explanations must be directly linked to predictions, rather than serving as post-hoc rationalizations.<n>We present model readability as a design principle enabling alignment, and PiNets as a modeling framework to pursue it in a deep learning context.
arXiv Detail & Related papers (2026-01-07T20:35:02Z)
Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment [5.156443267442059]
generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time.<n>We find that GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence.<n>In cases where it generates illegal moves, it also fails to capture a causal structure.
arXiv Detail & Related papers (2024-12-10T12:05:03Z)
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly [72.24742240125369]
In this paper, we uncover the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output the results of two-digit number additions with lengths extending up to 15 addends.
arXiv Detail & Related papers (2024-07-16T06:27:22Z)
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models [0.0]
We train a GPT model on Othello games and find that the model learned an internal representation of the board state. We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations. Unlike Li et al.'s prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character.
arXiv Detail & Related papers (2024-03-21T18:53:23Z)
Emergent Linear Representations in World Models of Self-Supervised Sequence Models [5.712566125397807]
Othello-playing neural network learned nonlinear models of the board state. We show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state.
arXiv Detail & Related papers (2023-09-02T13:37:34Z)
Towards Few-shot Inductive Link Prediction on Knowledge Graphs: A Relational Anonymous Walk-guided Neural Process Approach [49.00753238429618]
Few-shot inductive link prediction on knowledge graphs aims to predict missing links for unseen entities with few-shot links observed. Recent inductive methods utilize the sub-graphs around unseen entities to obtain the semantics and predict links inductively. We propose a novel relational anonymous walk-guided neural process for few-shot inductive link prediction on knowledge graphs, denoted as RawNP.
arXiv Detail & Related papers (2023-06-26T12:02:32Z)
Towards Prototype-Based Self-Explainable Graph Neural Network [37.90997236795843]
We study a novel problem of learning prototype-based self-explainable GNNs that can simultaneously give accurate predictions and prototype-based explanations on predictions. The learned prototypes are also used to simultaneously make prediction for for a test instance and provide instance-level explanation.
arXiv Detail & Related papers (2022-10-05T00:47:42Z)
Hidden Schema Networks [3.4123736336071864]
We introduce a novel neural language model that enforces, via inductive biases, explicit relational structures. The model encodes sentences into sequences of symbols, which correspond to nodes visited by biased random walkers. We show that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences.
arXiv Detail & Related papers (2022-07-08T09:26:19Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Temporal Graph Network Embedding with Causal Anonymous Walks Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network. For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings. We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z)
A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation [63.042651834453544]
We show that the unsupervised learning of disentangled representations is impossible without inductive biases on both the models and the data. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision.
arXiv Detail & Related papers (2020-10-27T10:17:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.