Related papers: Interpreting ResNet-based CLIP via Neuron-Attention Decomposition

Interpreting ResNet-based CLIP via Neuron-Attention Decomposition

URL: http://arxiv.org/abs/2509.19943v2
Date: Thu, 25 Sep 2025 07:27:10 GMT
Title: Interpreting ResNet-based CLIP via Neuron-Attention Decomposition
Authors: Edmund Bu, Yossi Gandelsman,
Abstract summary: We present a novel technique for interpreting the neurons in CLIP-ResNet by decomposing their contributions to the output into individual computation paths.<n>We find that these neuron-head pairs can be approximated by a single direction in CLIP-ResNet's image-text embedding space.<n>We employ the pairs for training-free semantic segmentation, outperforming previous methods for CLIP-ResNet.
Score: 17.65407773911596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel technique for interpreting the neurons in CLIP-ResNet by decomposing their contributions to the output into individual computation paths. More specifically, we analyze all pairwise combinations of neurons and the following attention heads of CLIP's attention-pooling layer. We find that these neuron-head pairs can be approximated by a single direction in CLIP-ResNet's image-text embedding space. Leveraging this insight, we interpret each neuron-head pair by associating it with text. Additionally, we find that only a sparse set of the neuron-head pairs have a significant contribution to the output value, and that some neuron-head pairs, while polysemantic, represent sub-concepts of their corresponding neurons. We use these observations for two applications. First, we employ the pairs for training-free semantic segmentation, outperforming previous methods for CLIP-ResNet. Second, we utilize the contributions of neuron-head pairs to monitor dataset distribution shifts. Our results demonstrate that examining individual computation paths in neural networks uncovers interpretable units, and that such units can be utilized for downstream tasks.

Related papers

NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions [16.00223741620103]
We propose a novel framework that transitions the focus from analyzing individual neurons to investigating groups of neurons.<n>Our automated framework, NeurFlow, first identifies core neurons and clusters them into groups based on shared functional relationships.
arXiv Detail & Related papers (2025-02-22T06:01:03Z)
Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text.<n>We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output.<n>Our results indicate that an automated interpretation of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z)
Detecting Modularity in Deep Neural Networks [8.967870619902211]
We consider the problem of assessing the modularity exhibited by a partitioning of a network's neurons. We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs. We show that these partitionings, even ones based only on weights, reveal groups of neurons that are important and coherent.
arXiv Detail & Related papers (2021-10-13T20:33:30Z)
Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel. We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability.<n>We show the most important principal components provide more complete and interpretable explanations than the most important neurons.<n>Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z)
Training Binary Neural Networks through Learning with Noisy Supervision [76.26677550127656]
This paper formalizes the binarization operations over neural networks from a learning perspective. Experimental results on benchmark datasets indicate that the proposed binarization technique attains consistent improvements over baselines.
arXiv Detail & Related papers (2020-10-10T01:59:39Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
On Tractable Representations of Binary Neural Networks [23.50970665150779]
We consider the compilation of a binary neural network's decision function into tractable representations such as Ordered Binary Decision Diagrams (OBDDs) and Sentential Decision Diagrams (SDDs) In experiments, we show that it is feasible to obtain compact representations of neural networks as SDDs.
arXiv Detail & Related papers (2020-04-05T03:21:26Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.