Modular Training of Neural Networks aids Interpretability
- URL: http://arxiv.org/abs/2502.02470v2
- Date: Fri, 07 Feb 2025 01:40:17 GMT
- Title: Modular Training of Neural Networks aids Interpretability
- Authors: Satvik Golechha, Maheep Chaudhary, Joan Velja, Alessandro Abate, Nandi Schoots,
- Abstract summary: We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering.
Using automated interpretability techniques, we show that our method can help train models that are more modular and learn different, disjoint, and smaller circuits.
- Score: 45.8172254436063
- License:
- Abstract: An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a "clusterability loss" function that encourages the formation of non-interacting clusters. Using automated interpretability techniques, we show that our method can help train models that are more modular and learn different, disjoint, and smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and language models. Our approach provides a promising direction for training neural networks that learn simpler functions and are easier to interpret.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Training Neural Networks for Modularity aids Interpretability [0.6749750044497732]
An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently.
We find pretrained models to be highly unclusterable and thus train models to be more modular using an enmeshment loss'' function that encourages the formation of non-interacting clusters.
arXiv Detail & Related papers (2024-09-24T05:03:49Z) - Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning [0.0]
We show that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics.
We demonstrate that the inductive bias introduced by the modular topology is strong enough for the network to perform well even when the connectivity within modules is fixed.
Our findings suggest that gradual modular growth of RNNs could provide advantages for learning increasingly complex tasks on evolutionary timescales.
arXiv Detail & Related papers (2024-06-10T13:44:07Z) - Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning [38.09011520275557]
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones.
We propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL.
arXiv Detail & Related papers (2024-06-04T15:47:03Z) - Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models [31.960749305728488]
We introduce a novel concept dubbed modular neural tangent kernel (mNTK)
We show that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $lambda_max$.
We propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $lambda_max$ exceeding a dynamic threshold.
arXiv Detail & Related papers (2024-05-13T07:46:48Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z) - Distributed Training of Deep Learning Models: A Taxonomic Perspective [11.924058430461216]
Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster.
We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines.
arXiv Detail & Related papers (2020-07-08T08:56:58Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.