Grokking modular arithmetic
- URL: http://arxiv.org/abs/2301.02679v1
- Date: Fri, 6 Jan 2023 19:00:01 GMT
- Title: Grokking modular arithmetic
- Authors: Andrey Gromov
- Abstract summary: We present a simple neural network that can learn modular arithmetic tasks.
We show that the network exhibits a sudden jump in generalization known as grokking''
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple neural network that can learn modular arithmetic tasks
and exhibits a sudden jump in generalization known as ``grokking''. Concretely,
we present (i) fully-connected two-layer networks that exhibit grokking on
various modular arithmetic tasks under vanilla gradient descent with the MSE
loss function in the absence of any regularization; (ii) evidence that grokking
modular arithmetic corresponds to learning specific feature maps whose
structure is determined by the task; (iii) analytic expressions for the weights
-- and thus for the feature maps -- that solve a large class of modular
arithmetic tasks; and (iv) evidence that these feature maps are also found by
vanilla gradient descent as well as AdamW, thereby establishing complete
interpretability of the representations learnt by the network.
Related papers
- Breaking Neural Network Scaling Laws with Modularity [8.482423139660153]
We show how the amount of training data required to generalize varies with the intrinsic dimensionality of a task's input.
We then develop a novel learning rule for modular networks to exploit this advantage.
arXiv Detail & Related papers (2024-09-09T16:43:09Z) - Grokking Modular Polynomials [5.358878931933351]
We extend the class of analytical solutions to include modular multiplication as well as modular addition with many terms.
We show that real networks trained on these datasets learn similar solutions upon generalization (grokking)
We hypothesize a classification of modulars into learnable and non-learnable via neural networks training.
arXiv Detail & Related papers (2024-06-05T17:59:35Z) - Discovering modular solutions that generalize compositionally [55.46688816816882]
We show that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations.
We further demonstrate empirically that meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
arXiv Detail & Related papers (2023-12-22T16:33:50Z) - Randomly Weighted Neuromodulation in Neural Networks Facilitates
Learning of Manifolds Common Across Tasks [1.9580473532948401]
Geometric Sensitive Hashing functions are neural network models that learn class-specific manifold geometry in supervised learning.
We show that a randomly weighted neural network with a neuromodulation system can realize this function.
arXiv Detail & Related papers (2023-11-17T15:22:59Z) - Neural Sculpting: Uncovering hierarchically modular task structure in
neural networks through pruning and network analysis [8.080026425139708]
We show that hierarchically modular neural networks offer benefits such as learning efficiency, generalization, multi-task learning, and transfer.
We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference.
arXiv Detail & Related papers (2023-05-28T15:12:32Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Meta-Causal Feature Learning for Out-of-Distribution Generalization [71.38239243414091]
This paper presents a balanced meta-causal learner (BMCL), which includes a balanced task generation module (BTG) and a meta-causal feature learning module (MCFL)
BMCL effectively identifies the class-invariant visual regions for classification and may serve as a general framework to improve the performance of the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-22T09:07:02Z) - Clustering units in neural networks: upstream vs downstream information [3.222802562733787]
We study modularity of hidden layer representations of feedforward, fully connected networks.
We find two surprising results: first, dropout dramatically increased modularity, while other forms of weight regularization had more modest effects.
This has important implications for representation-learning, as it suggests that finding modular representations that reflect structure in inputs may be a distinct goal from learning modular representations that reflect structure in outputs.
arXiv Detail & Related papers (2022-03-22T15:35:10Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Neural Function Modules with Sparse Arguments: A Dynamic Approach to
Integrating Information across Layers [84.57980167400513]
Neural Function Modules (NFM) aims to introduce the same structural capability into deep learning.
Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems.
The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm.
arXiv Detail & Related papers (2020-10-15T20:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.