Clustering units in neural networks: upstream vs downstream information
- URL: http://arxiv.org/abs/2203.11815v1
- Date: Tue, 22 Mar 2022 15:35:10 GMT
- Title: Clustering units in neural networks: upstream vs downstream information
- Authors: Richard D. Lange, David S. Rolnick, Konrad P. Kording
- Abstract summary: We study modularity of hidden layer representations of feedforward, fully connected networks.
We find two surprising results: first, dropout dramatically increased modularity, while other forms of weight regularization had more modest effects.
This has important implications for representation-learning, as it suggests that finding modular representations that reflect structure in inputs may be a distinct goal from learning modular representations that reflect structure in outputs.
- Score: 3.222802562733787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been hypothesized that some form of "modular" structure in artificial
neural networks should be useful for learning, compositionality, and
generalization. However, defining and quantifying modularity remains an open
problem. We cast the problem of detecting functional modules into the problem
of detecting clusters of similar-functioning units. This begs the question of
what makes two units functionally similar. For this, we consider two broad
families of methods: those that define similarity based on how units respond to
structured variations in inputs ("upstream"), and those based on how variations
in hidden unit activations affect outputs ("downstream"). We conduct an
empirical study quantifying modularity of hidden layer representations of
simple feedforward, fully connected networks, across a range of
hyperparameters. For each model, we quantify pairwise associations between
hidden units in each layer using a variety of both upstream and downstream
measures, then cluster them by maximizing their "modularity score" using
established tools from network science. We find two surprising results: first,
dropout dramatically increased modularity, while other forms of weight
regularization had more modest effects. Second, although we observe that there
is usually good agreement about clusters within both upstream methods and
downstream methods, there is little agreement about the cluster assignments
across these two families of methods. This has important implications for
representation-learning, as it suggests that finding modular representations
that reflect structure in inputs (e.g. disentanglement) may be a distinct goal
from learning modular representations that reflect structure in outputs (e.g.
compositionality).
Related papers
- R-Cut: Enhancing Explainability in Vision Transformers with Relationship
Weighted Out and Cut [14.382326829600283]
We introduce two modules: the Relationship Weighted Out" and the Cut" modules.
The Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color.
We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset.
arXiv Detail & Related papers (2023-07-18T08:03:51Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Robustness modularity in complex networks [1.749935196721634]
We propose a new measure based on the concept of robustness.
robustness modularity is the probability to find trivial partitions when the structure of the network is randomly perturbed.
Tests on artificial and real graphs reveal that robustness modularity can be used to assess and compare the strength of the community structure of different networks.
arXiv Detail & Related papers (2021-10-05T19:00:45Z) - LieTransformer: Equivariant self-attention for Lie Groups [49.9625160479096]
Group equivariant neural networks are used as building blocks of group invariant neural networks.
We extend the scope of the literature to self-attention, that is emerging as a prominent building block of deep learning models.
We propose the LieTransformer, an architecture composed of LieSelfAttention layers that are equivariant to arbitrary Lie groups and their discrete subgroups.
arXiv Detail & Related papers (2020-12-20T11:02:49Z) - Neural Function Modules with Sparse Arguments: A Dynamic Approach to
Integrating Information across Layers [84.57980167400513]
Neural Function Modules (NFM) aims to introduce the same structural capability into deep learning.
Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems.
The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm.
arXiv Detail & Related papers (2020-10-15T20:43:17Z) - A new nature inspired modularity function adapted for unsupervised
learning involving spatially embedded networks: A comparative analysis [0.0]
Unsupervised machine learning methods can be of great help in many traditional engineering disciplines.
We have compared the performance of our newly developed modularity function with some of the well-known modularity functions.
We show that for the class of networks considered in this article, our method produce much better results than the competing methods.
arXiv Detail & Related papers (2020-07-18T04:32:14Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - A Deep Joint Sparse Non-negative Matrix Factorization Framework for
Identifying the Common and Subject-specific Functional Units of Tongue Motion
During Speech [7.870139900799612]
We develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech.
We transform NMF with sparse and graph regularizations into modular architectures akin to deep neural networks.
arXiv Detail & Related papers (2020-07-09T15:05:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.