Understanding polysemanticity in neural networks through coding theory
- URL: http://arxiv.org/abs/2401.17975v1
- Date: Wed, 31 Jan 2024 16:31:54 GMT
- Title: Understanding polysemanticity in neural networks through coding theory
- Authors: Simon C. Marshall and Jan H. Kirchner
- Abstract summary: We propose a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes.
We show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is.
Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability.
- Score: 0.8702432681310401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite substantial efforts, neural network interpretability remains an
elusive goal, with previous research failing to provide succinct explanations
of most single neurons' impact on the network output. This limitation is due to
the polysemantic nature of most neurons, whereby a given neuron is involved in
multiple unrelated network states, complicating the interpretation of that
neuron. In this paper, we apply tools developed in neuroscience and information
theory to propose both a novel practical approach to network interpretability
and theoretical insights into polysemanticity and the density of codes. We
infer levels of redundancy in the network's code by inspecting the
eigenspectrum of the activation's covariance matrix. Furthermore, we show how
random projections can reveal whether a network exhibits a smooth or
non-differentiable code and hence how interpretable the code is. This same
framework explains the advantages of polysemantic neurons to learning
performance and explains trends found in recent results by Elhage et
al.~(2022). Our approach advances the pursuit of interpretability in neural
networks, providing insights into their underlying structure and suggesting new
avenues for circuit-level interpretability.
Related papers
- Interpreting Neural Networks through Mahalanobis Distance [0.0]
This paper introduces a theoretical framework that connects neural network linear layers with the Mahalanobis distance.
Although this work is theoretical and does not include empirical data, the proposed distance-based interpretation has the potential to enhance model robustness, improve generalization, and provide more intuitive explanations of neural network decisions.
arXiv Detail & Related papers (2024-10-25T07:21:44Z) - Statistical tuning of artificial neural network [0.0]
This study introduces methods to enhance the understanding of neural networks, focusing specifically on models with a single hidden layer.
We propose statistical tests to assess the significance of input neurons and introduce algorithms for dimensionality reduction.
This research advances the field of Explainable Artificial Intelligence by presenting robust statistical frameworks for interpreting neural networks.
arXiv Detail & Related papers (2024-09-24T19:47:03Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Automated Natural Language Explanation of Deep Visual Neurons with Large
Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models.
Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z) - DISCOVER: Making Vision Networks Interpretable via Competition and
Dissection [11.028520416752325]
This work contributes to post-hoc interpretability, and specifically Network Dissection.
Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task.
arXiv Detail & Related papers (2023-10-07T21:57:23Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - On 1/n neural representation and robustness [13.491651740693705]
We show that imposing the experimentally observed structure on artificial neural networks makes them more robust to adversarial attacks.
Our findings complement the existing theory relating wide neural networks to kernel methods.
arXiv Detail & Related papers (2020-12-08T20:34:49Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.