Related papers: Towards Generating Informative Textual Description for Neurons in Language Models

Towards Generating Informative Textual Description for Neurons in Language Models

URL: http://arxiv.org/abs/2401.16731v1
Date: Tue, 30 Jan 2024 04:06:25 GMT
Title: Towards Generating Informative Textual Description for Neurons in Language Models
Authors: Shrayani Mondal, Rishabh Garodia, Arbaaz Qureshi, Taesung Lee and Youngja Park
Abstract summary: We propose a framework that ties textual descriptions to neurons. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2
Score: 6.884227665279812
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent developments in transformer-based language models have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on a finite set of pre-defined descriptors or require manual annotations for training a secondary model that can then explain the neurons of the primary model. In this paper, we take BERT as an example and we try to remove these constraints and propose a novel and scalable framework that ties textual descriptions to neurons. We leverage the potential of generative language models to discover human-interpretable descriptors present in a dataset and use an unsupervised approach to explain neurons with these descriptors. Through various qualitative and quantitative analyses, we demonstrate the effectiveness of this framework in generating useful data-specific descriptors with little human involvement in identifying the neurons that encode these descriptors. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2

Related papers

NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z)
A generative framework to bridge data-driven models and scientific theories in language neuroscience [84.76462599023802]
We present generative explanation-mediated validation, a framework for generating concise explanations of language selectivity in the brain. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying statistical models.
arXiv Detail & Related papers (2024-10-01T15:57:48Z)
CoSy: Evaluating Textual Explanations of Neurons [5.696573924249008]
We introduce CoSy, a framework for evaluating textual explanations of latent neurons. By comparing the neuron's response to generated data points and control data points, we can estimate the quality of the explanation. We validate our framework through sanity checks and benchmark various neuron description methods for Computer Vision tasks.
arXiv Detail & Related papers (2024-05-30T17:59:04Z)
Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models [9.962488213825859]
Describe-and-Dissect (DnD) is a novel method to describe the roles of hidden neurons in vision networks. DnD produces complex natural language descriptions without the need for labeled training data or a predefined set of concepts.
arXiv Detail & Related papers (2024-03-20T17:33:02Z)
Investigating the Encoding of Words in BERT's Neurons using Feature Textualization [11.943486282441143]
We propose a technique to produce representations of neurons in embedding word space. We find that the produced representations can provide insights about the encoded knowledge in individual neurons.
arXiv Detail & Related papers (2023-11-14T15:21:49Z)
Automated Natural Language Explanation of Deep Visual Neurons with Large Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models. Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z)
On Model Explanations with Transferable Neural Pathways [41.2093021477798]
We propose a Generative Class-relevant Neural Pathway (GEN-CNP) model that learns to predict the neural pathways from the target model's feature maps. We propose to transfer the class-relevant neural pathways to explain samples of the same class and show experimentally and qualitatively their faithfulness and interpretability.
arXiv Detail & Related papers (2023-09-18T15:50:38Z)
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models [0.0]
N2G is a tool which takes a neuron and its dataset examples, and automatically distills the neuron's behaviour on those examples to an interpretable graph. We use truncation and saliency methods to only present the important tokens, and augment the dataset examples with more diverse samples to better capture the extent of neuron behaviour. These graphs can be visualised to aid manual interpretation by researchers, but can also output token activations on text to compare to the neuron's ground truth activations for automatic validation.
arXiv Detail & Related papers (2023-04-22T19:06:13Z)
NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer. NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z)
Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z)
Generalizable Neuro-symbolic Systems for Commonsense Question Answering [67.72218865519493]
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed.
arXiv Detail & Related papers (2022-01-17T06:13:37Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain. In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.