Related papers: Emergent Modularity in Pre-trained Transformers

Emergent Modularity in Pre-trained Transformers

URL: http://arxiv.org/abs/2305.18390v2
Date: Mon, 30 Oct 2023 07:40:35 GMT
Title: Emergent Modularity in Pre-trained Transformers
Authors: Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou
Abstract summary: We consider two main characteristics of modularity: functional specialization of neurons and function-based neuron grouping. We study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions.
Score: 127.08792763817496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.

Related papers

NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z)
Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI) Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli. In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs) This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z)
Don't Cut Corners: Exact Conditions for Modularity in Biologically Inspired Representations [52.48094670415497]
We develop a theory of when biologically inspired representations modularise with respect to source variables (sources) We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal biologically-inspired linear autoencoder modularise. Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work.
arXiv Detail & Related papers (2024-10-08T17:41:37Z)
No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks [25.30801109401654]
Since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? We propose a two-step framework for prototyping task-based neurons. Experiments show that the proposed task-based neuron design is not only feasible but also delivers competitive performance over other state-of-the-art models.
arXiv Detail & Related papers (2024-05-03T09:12:46Z)
Learning dynamic representations of the functional connectome in neurobiological networks [41.94295877935867]
We introduce an unsupervised approach to learn the dynamic affinities between neurons in live, behaving animals. We show that our method is able to robustly predict causal interactions between neurons to generate behavior.
arXiv Detail & Related papers (2024-02-21T19:54:25Z)
Modular Boundaries in Recurrent Neural Networks [39.626497874552555]
We use a community detection method from network science known as modularity to partition neurons into distinct modules. These partitions allow us to ask the following question: do these modular boundaries matter to the system?
arXiv Detail & Related papers (2023-10-31T16:37:01Z)
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability [5.15188009671301]
Brain-Inspired Modular Training is a method for making neural networks more modular and interpretable. BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection.
arXiv Detail & Related papers (2023-05-04T17:56:42Z)
Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection [50.14730810124592]
Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization. We propose FLEXSUBNET, a family of flexible neural models for both monotone and non-monotone submodular functions.
arXiv Detail & Related papers (2022-10-20T06:00:45Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
DeepRetinotopy: Predicting the Functional Organization of Human Visual Cortex from Structural MRI Data using Geometric Deep Learning [125.99533416395765]
We developed a deep learning model capable of exploiting the structure of the cortex to learn the complex relationship between brain function and anatomy from structural and functional MRI data. Our model was able to predict the functional organization of human visual cortex from anatomical properties alone, and it was also able to predict nuanced variations across individuals.
arXiv Detail & Related papers (2020-05-26T04:54:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.