Related papers: Investigating the Encoding of Words in BERT's Neurons using Feature Textualization

Investigating the Encoding of Words in BERT's Neurons using Feature Textualization

URL: http://arxiv.org/abs/2311.08240v1
Date: Tue, 14 Nov 2023 15:21:49 GMT
Title: Investigating the Encoding of Words in BERT's Neurons using Feature Textualization
Authors: Tanja Baeumel, Soniya Vijayakumar, Josef van Genabith, Guenter Neumann, Simon Ostermann
Abstract summary: We propose a technique to produce representations of neurons in embedding word space. We find that the produced representations can provide insights about the encoded knowledge in individual neurons.
Score: 11.943486282441143
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. The situation is different in computer vision, where feature visualization provides a decompositional interpretability technique for neurons of vision models. Activation maximization is used to synthesize inherently interpretable visual representations of the information encoded in individual neurons. Our work is inspired by this but presents a cautionary tale on the interpretability of single neurons, based on the first large-scale attempt to adapt activation maximization to NLP, and, more specifically, large PLMs. We propose feature textualization, a technique to produce dense representations of neurons in the PLM word embedding space. We apply feature textualization to the BERT model (Devlin et al., 2019) to investigate whether the knowledge encoded in individual neurons can be interpreted and symbolized. We find that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clearcut symbolic units of language such as words. Additionally, we use feature textualization to investigate how many neurons are needed to encode words in BERT.

Related papers

How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective [64.79894853375478]
We propose a new finer-grained neuron identification algorithm, which detects language neurons(including language-specific neurons and language-related neurons) and language-agnostic neurons.<n>Based on the distributional characteristics of different types of neurons, we divide the LLMs' internal process for multilingual inference into four parts.<n>We systematically analyze the models before and after alignment with a focus on different types of neurons.
arXiv Detail & Related papers (2025-05-27T17:59:52Z)
Deciphering Functions of Neurons in Vision-Language Models [37.29432842212334]
This study aims to delve into the internals of vision-language models (VLMs) to interpret the functions of individual neurons. We observe the activations of neurons with respects to the input visual tokens and text tokens, and reveal some interesting findings. We build a framework that automates the explanation of neurons with the assistant of GPT-4o. For visual neurons, we propose an activation simulator to assess the reliability of the explanations for visual neurons.
arXiv Detail & Related papers (2025-02-10T10:00:06Z)
Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text. We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z)
Multimodal Neurons in Pretrained Text-Only Transformers [52.20828443544296]
We identify "multimodal neurons" that convert visual representations into corresponding text. We show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.
arXiv Detail & Related papers (2023-08-03T05:27:12Z)
Neuron to Graph: Interpreting Language Model Neurons at Scale [8.32093320910416]
This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within Large Language Models. We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable graph.
arXiv Detail & Related papers (2023-05-31T14:44:33Z)
Redundancy and Concept Analysis for Code-trained Language Models [5.726842555987591]
Code-trained language models have proven to be highly effective for various code intelligence tasks. They can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. We perform the first neuron-level analysis for source code models to identify textitimportant neurons within latent representations.
arXiv Detail & Related papers (2023-05-01T15:22:41Z)
BrainBERT: Self-supervised representation learning for intracranial recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z)
Constraints on the design of neuromorphic circuits set by the properties of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour. Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z)
Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features [9.783560855840602]
This paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories.
arXiv Detail & Related papers (2022-10-13T05:49:33Z)
Natural Language Descriptions of Deep Visual Features [50.270035018478666]
We introduce a procedure that automatically labels neurons with open-ended, compositional, natural language descriptions. We use MILAN for analysis, characterizing the distribution and importance of neurons selective for attribute, category, and relational information in vision models. We also use MILAN for auditing, surfacing neurons sensitive to protected categories like race and gender in models trained on datasets intended to obscure these features.
arXiv Detail & Related papers (2022-01-26T18:48:02Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.