Related papers: Interpreting Grokked Transformers in Complex Modular Arithmetic

Interpreting Grokked Transformers in Complex Modular Arithmetic

URL: http://arxiv.org/abs/2402.16726v2
Date: Tue, 27 Feb 2024 04:58:24 GMT
Title: Interpreting Grokked Transformers in Complex Modular Arithmetic
Authors: Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo
Abstract summary: We observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering. Our empirical analysis emphasizes the importance of holistic evaluation among various combinations.
Score: 31.78132974646383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Frequency Sparsity and Fourier Coefficient Ratio, which not only indicate the late generalization but also characterize distinctive internal representations of grokked models per modular operation. Our empirical analysis emphasizes the importance of holistic evaluation among various combinations.

Related papers

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation [54.65707216563953]
We propose NeuralGrok, a gradient-based approach that learns an optimal gradient transformation to accelerate generalization of transformers in arithmetic tasks. Our experiments demonstrate that NeuralGrok significantly accelerates generalization, particularly in challenging arithmetic tasks. We also show that NeuralGrok promotes a more stable training paradigm, constantly reducing the model's complexity.
arXiv Detail & Related papers (2025-04-24T04:41:35Z)
Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z)
Generalization of Modular Spread Complexity for Non-Hermitian Density Matrices [0.0]
In this work we generalize the concept of modular spread complexity to the cases where the reduced density matrix is non-Hermitian. We define the quantity pseudo-capacity which generalizes capacity of entanglement, and corresponds to the early modular-time measure of pseudo-modular complexity. We show some analytical calculations for 2-level systems and 4-qubit models and then do numerical investigations on the quantum phase transition of transverse field Ising model.
arXiv Detail & Related papers (2024-10-07T17:59:16Z)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z)
Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions. We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions. We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z)
Discovering modular solutions that generalize compositionally [55.46688816816882]
We show that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations. We further demonstrate empirically that meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
arXiv Detail & Related papers (2023-12-22T16:33:50Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data [2.2871867623460207]
In many applications data span variables of different types, whose principled joint analysis is nontrivial. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. We propose flexible and scalable methodology for data with variables of entirely general mixed type.
arXiv Detail & Related papers (2022-11-21T18:21:31Z)
Inductive Biases and Variable Creation in Self-Attention Mechanisms [25.79946667926312]
This work provides a theoretical analysis of the inductive biases of self-attention modules. Our focus is to rigorously establish which functions and long-range dependencies self-attention blocks prefer to represent. Our main result shows that bounded-norm Transformer layers create sparse variables.
arXiv Detail & Related papers (2021-10-19T16:36:19Z)
A Compositional Atlas of Tractable Circuit Operations: From Simple Transformations to Complex Information-Theoretic Queries [44.36335714431731]
We show how complex inference scenarios for machine learning can be represented in terms of tractable modular operations over circuits. We derive a unified framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.
arXiv Detail & Related papers (2021-02-11T17:26:32Z)
A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech [7.870139900799612]
We develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech. We transform NMF with sparse and graph regularizations into modular architectures akin to deep neural networks.
arXiv Detail & Related papers (2020-07-09T15:05:44Z)
From Sets to Multisets: Provable Variational Inference for Probabilistic Integer Submodular Models [82.95892656532696]
Submodular functions have been studied extensively in machine learning and data mining. In this work, we propose a continuous DR-submodular extension for integer submodular functions. We formulate a new probabilistic model which is defined through integer submodular functions.
arXiv Detail & Related papers (2020-06-01T22:20:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.