Pruned Neural Networks are Surprisingly Modular
- URL: http://arxiv.org/abs/2003.04881v6
- Date: Mon, 7 Feb 2022 21:22:13 GMT
- Title: Pruned Neural Networks are Surprisingly Modular
- Authors: Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell
- Abstract summary: We introduce a measurable notion of modularity for multi-layer perceptrons.
We investigate the modular structure of neural networks trained on datasets of small images.
- Score: 9.184659875364689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The learned weights of a neural network are often considered devoid of
scrutable internal structure. To discern structure in these weights, we
introduce a measurable notion of modularity for multi-layer perceptrons (MLPs),
and investigate the modular structure of MLPs trained on datasets of small
images. Our notion of modularity comes from the graph clustering literature: a
"module" is a set of neurons with strong internal connectivity but weak
external connectivity. We find that training and weight pruning produces MLPs
that are more modular than randomly initialized ones, and often significantly
more modular than random MLPs with the same (sparse) distribution of weights.
Interestingly, they are much more modular when trained with dropout. We also
present exploratory analyses of the importance of different modules for
performance and how modules depend on each other. Understanding the modular
structure of neural networks, when such structure exists, will hopefully render
their inner workings more interpretable to engineers. Note that this paper has
been superceded by "Clusterability in Neural Networks", arxiv:2103.03386 and
"Quantifying Local Specialization in Deep Neural Networks", arxiv:2110.08058!
Related papers
- Breaking Neural Network Scaling Laws with Modularity [8.482423139660153]
We show how the amount of training data required to generalize varies with the intrinsic dimensionality of a task's input.
We then develop a novel learning rule for modular networks to exploit this advantage.
arXiv Detail & Related papers (2024-09-09T16:43:09Z) - Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning [0.0]
We show that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics.
We demonstrate that the inductive bias introduced by the modular topology is strong enough for the network to perform well even when the connectivity within modules is fixed.
Our findings suggest that gradual modular growth of RNNs could provide advantages for learning increasingly complex tasks on evolutionary timescales.
arXiv Detail & Related papers (2024-06-10T13:44:07Z) - Modular Boundaries in Recurrent Neural Networks [39.626497874552555]
We use a community detection method from network science known as modularity to partition neurons into distinct modules.
These partitions allow us to ask the following question: do these modular boundaries matter to the system?
arXiv Detail & Related papers (2023-10-31T16:37:01Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Is a Modular Architecture Enough? [80.32451720642209]
We provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions.
We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems.
arXiv Detail & Related papers (2022-06-06T16:12:06Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Robustness modularity in complex networks [1.749935196721634]
We propose a new measure based on the concept of robustness.
robustness modularity is the probability to find trivial partitions when the structure of the network is randomly perturbed.
Tests on artificial and real graphs reveal that robustness modularity can be used to assess and compare the strength of the community structure of different networks.
arXiv Detail & Related papers (2021-10-05T19:00:45Z) - Are Neural Nets Modular? Inspecting Functional Modularity Through
Differentiable Weight Masks [10.0444013205203]
Understanding if and how NNs are modular could provide insights into how to improve them.
Current inspection methods, however, fail to link modules to their functionality.
arXiv Detail & Related papers (2020-10-05T15:04:11Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.