Grokking Modular Polynomials
- URL: http://arxiv.org/abs/2406.03495v1
- Date: Wed, 5 Jun 2024 17:59:35 GMT
- Title: Grokking Modular Polynomials
- Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov,
- Abstract summary: We extend the class of analytical solutions to include modular multiplication as well as modular addition with many terms.
We show that real networks trained on these datasets learn similar solutions upon generalization (grokking)
We hypothesize a classification of modulars into learnable and non-learnable via neural networks training.
- Score: 5.358878931933351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition task is known in the literature. In this work, we (i) extend the class of analytical solutions to include modular multiplication as well as modular addition with many terms. Additionally, we show that real networks trained on these datasets learn similar solutions upon generalization (grokking). (ii) We combine these "expert" solutions to construct networks that generalize on arbitrary modular polynomials. (iii) We hypothesize a classification of modular polynomials into learnable and non-learnable via neural networks training; and provide experimental evidence supporting our claims.
Related papers
- Breaking Neural Network Scaling Laws with Modularity [8.482423139660153]
We show how the amount of training data required to generalize varies with the intrinsic dimensionality of a task's input.
We then develop a novel learning rule for modular networks to exploit this advantage.
arXiv Detail & Related papers (2024-09-09T16:43:09Z) - Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - A Neural Rewriting System to Solve Algorithmic Problems [47.129504708849446]
We propose a modular architecture designed to learn a general procedure for solving nested mathematical formulas.
Inspired by rewriting systems, a classic framework in symbolic artificial intelligence, we include in the architecture three specialized and interacting modules.
We benchmark our system against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies.
arXiv Detail & Related papers (2024-02-27T10:57:07Z) - Discovering modular solutions that generalize compositionally [55.46688816816882]
We show that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations.
We further demonstrate empirically that meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
arXiv Detail & Related papers (2023-12-22T16:33:50Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Grokking modular arithmetic [0.0]
We present a simple neural network that can learn modular arithmetic tasks.
We show that the network exhibits a sudden jump in generalization known as grokking''
arXiv Detail & Related papers (2023-01-06T19:00:01Z) - Is a Modular Architecture Enough? [80.32451720642209]
We provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions.
We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems.
arXiv Detail & Related papers (2022-06-06T16:12:06Z) - Neural Function Modules with Sparse Arguments: A Dynamic Approach to
Integrating Information across Layers [84.57980167400513]
Neural Function Modules (NFM) aims to introduce the same structural capability into deep learning.
Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems.
The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm.
arXiv Detail & Related papers (2020-10-15T20:43:17Z) - Are Neural Nets Modular? Inspecting Functional Modularity Through
Differentiable Weight Masks [10.0444013205203]
Understanding if and how NNs are modular could provide insights into how to improve them.
Current inspection methods, however, fail to link modules to their functionality.
arXiv Detail & Related papers (2020-10-05T15:04:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.