Understanding polysemanticity in neural networks through coding theory
- URL: http://arxiv.org/abs/2401.17975v1
- Date: Wed, 31 Jan 2024 16:31:54 GMT
- Title: Understanding polysemanticity in neural networks through coding theory
- Authors: Simon C. Marshall and Jan H. Kirchner
- Abstract summary: We propose a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes.
We show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is.
Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability.
- Score: 0.8702432681310401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite substantial efforts, neural network interpretability remains an
elusive goal, with previous research failing to provide succinct explanations
of most single neurons' impact on the network output. This limitation is due to
the polysemantic nature of most neurons, whereby a given neuron is involved in
multiple unrelated network states, complicating the interpretation of that
neuron. In this paper, we apply tools developed in neuroscience and information
theory to propose both a novel practical approach to network interpretability
and theoretical insights into polysemanticity and the density of codes. We
infer levels of redundancy in the network's code by inspecting the
eigenspectrum of the activation's covariance matrix. Furthermore, we show how
random projections can reveal whether a network exhibits a smooth or
non-differentiable code and hence how interpretable the code is. This same
framework explains the advantages of polysemantic neurons to learning
performance and explains trends found in recent results by Elhage et
al.~(2022). Our approach advances the pursuit of interpretability in neural
networks, providing insights into their underlying structure and suggesting new
avenues for circuit-level interpretability.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Automated Natural Language Explanation of Deep Visual Neurons with Large
Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models.
Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z) - DISCOVER: Making Vision Networks Interpretable via Competition and
Dissection [11.028520416752325]
This work contributes to post-hoc interpretability, and specifically Network Dissection.
Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task.
arXiv Detail & Related papers (2023-10-07T21:57:23Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Seeking Interpretability and Explainability in Binary Activated Neural Networks [2.828173677501078]
We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks.
We present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights.
arXiv Detail & Related papers (2022-09-07T20:11:17Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - NFT-K: Non-Fungible Tangent Kernels [23.93508901712177]
We develop a new network as a combination of multiple neural tangent kernels, one to model each layer of the deep neural network individually.
We demonstrate the interpretability of this model on two datasets, showing that the multiple kernels model elucidates the interplay between the layers and predictions.
arXiv Detail & Related papers (2021-10-11T00:35:47Z) - On 1/n neural representation and robustness [13.491651740693705]
We show that imposing the experimentally observed structure on artificial neural networks makes them more robust to adversarial attacks.
Our findings complement the existing theory relating wide neural networks to kernel methods.
arXiv Detail & Related papers (2020-12-08T20:34:49Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.