Related papers: How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

URL: http://arxiv.org/abs/2305.00586v5
Date: Thu, 2 Nov 2023 10:55:18 GMT
Title: How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Authors: Michael Hanna, Ollie Liu and Alexandre Variengien
Abstract summary: We use mechanistic interpretability techniques to explain the mathematical abilities of GPT-2 small. We show that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Our results suggest that GPT-2 small computes greater-than using a complex but general mechanism.
Score: 52.92472140375308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex but general mechanism that activates across diverse contexts.

Related papers

Weak-to-Strong Generalization Even in Random Feature Networks, Provably [54.68030827799126]
We show that weak-to-strong generalization does not require a strong learner like GPT-4. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the weak teacher.
arXiv Detail & Related papers (2025-03-04T18:58:00Z)
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs [2.534053759586253]
KG-prover augments general-purpose LLMs to construct and formalize mathematical proofs.<n>General-purpose LLMs improve up to 21% on miniF2F-test when combined with KG-Prover.<n> KG-Prover achieves consistent improvements ranging from 2-11% on the ProofNet, miniF2F-test, and MUSTARD datasets without additional scaling.
arXiv Detail & Related papers (2025-02-04T07:17:34Z)
Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning [85.635988711588]
We argue that enhancing the capabilities of large language models requires a paradigm shift in the design of mathematical datasets. We advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal. We provide a questionnaire designed specifically for math datasets that we urge creators to include with their datasets.
arXiv Detail & Related papers (2024-12-19T18:55:17Z)
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks [5.358878931933351]
We study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions $z = a, x + b, y ;mathrmmod; p$ labeled by the vector $(a, b) in mathbbZ_p2$.
arXiv Detail & Related papers (2024-06-04T17:59:36Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning. We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task. We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z)
Towards Automated Circuit Discovery for Mechanistic Interpretability [7.605075513099429]
This paper systematizes the mechanistic interpretability process they followed. By varying the dataset, metric, and units under investigation, researchers can understand the functionality of each component. We propose several algorithms and reproduce previous interpretability results to validate them.
arXiv Detail & Related papers (2023-04-28T17:36:53Z)
Mathematical Capabilities of ChatGPT [35.71603158908465]
We release two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics. We benchmark the models on a range of fine-grained performance metrics.
arXiv Detail & Related papers (2023-01-31T18:59:03Z)
Generalization on the Unseen, Logic Reasoning and Degree Curriculum [25.7378861650474]
This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting. We study how different network architectures trained by (S)GD perform under GOTU. More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements.
arXiv Detail & Related papers (2023-01-30T17:44:05Z)
The Unreliability of Explanations in Few-Shot In-Context Learning [50.77996380021221]
We focus on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We show that explanations judged as good by humans--those that are logically consistent with the input--usually indicate more accurate predictions. We present a framework for calibrating model predictions based on the reliability of the explanations.
arXiv Detail & Related papers (2022-05-06T17:57:58Z)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model) GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z)
Kronecker Decomposition for GPT Compression [8.60086973058282]
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain. Despite the superior performance of GPT, GPT can be very prohibitive for deploying this model on devices with limited computational power or memory. In this work, we use Kronecker decomposition to compress the linear mappings of the GPT-22 model.
arXiv Detail & Related papers (2021-10-15T15:28:39Z)
MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.