Discovering Latent Concepts Learned in BERT
- URL: http://arxiv.org/abs/2205.07237v1
- Date: Sun, 15 May 2022 09:45:34 GMT
- Title: Discovering Latent Concepts Learned in BERT
- Authors: Fahim Dalvi, Abdul Rafae Khan, Firoj Alam, Nadir Durrani, Jia Xu,
Hassan Sajjad
- Abstract summary: We study what latent concepts exist in the pre-trained BERT model.
We also release a novel BERT ConceptNet dataset (BCN) consisting of 174 concept labels and 1M annotated instances.
- Score: 21.760620298330235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large number of studies that analyze deep neural network models and their
ability to encode various linguistic and non-linguistic concepts provide an
interpretation of the inner mechanics of these models. The scope of the
analyses is limited to pre-defined concepts that reinforce the traditional
linguistic knowledge and do not reflect on how novel concepts are learned by
the model. We address this limitation by discovering and analyzing latent
concepts learned in neural network models in an unsupervised fashion and
provide interpretations from the model's perspective. In this work, we study:
i) what latent concepts exist in the pre-trained BERT model, ii) how the
discovered latent concepts align or diverge from classical linguistic hierarchy
and iii) how the latent concepts evolve across layers. Our findings show: i) a
model learns novel concepts (e.g. animal categories and demographic groups),
which do not strictly adhere to any pre-defined categorization (e.g. POS,
semantic tags), ii) several latent concepts are based on multiple properties
which may include semantics, syntax, and morphology, iii) the lower layers in
the model dominate in learning shallow lexical concepts while the higher layers
learn semantic relations and iv) the discovered latent concepts highlight
potential biases learned in the model. We also release a novel BERT ConceptNet
dataset (BCN) consisting of 174 concept labels and 1M annotated instances.
Related papers
- Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.
We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z) - A Concept-Based Explainability Framework for Large Multimodal Models [52.37626977572413]
We propose a dictionary learning based approach, applied to the representation of tokens.
We show that these concepts are well semantically grounded in both vision and text.
We show that the extracted multimodal concepts are useful to interpret representations of test samples.
arXiv Detail & Related papers (2024-06-12T10:48:53Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Hierarchical Semantic Tree Concept Whitening for Interpretable Image
Classification [19.306487616731765]
Post-hoc analysis can only discover the patterns or rules that naturally exist in models.
We proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers.
Our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.
arXiv Detail & Related papers (2023-07-10T04:54:05Z) - ConceptX: A Framework for Latent Concept Analysis [21.760620298330235]
We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in Language Models (pLMs)
We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts.
arXiv Detail & Related papers (2022-11-12T11:31:09Z) - Analyzing Encoded Concepts in Transformer Language Models [21.76062029833023]
ConceptX analyses how latent concepts are encoded in representations learned within pre-trained language models.
It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts.
arXiv Detail & Related papers (2022-06-27T13:32:10Z) - Learnable Visual Words for Interpretable Image Recognition [70.85686267987744]
We propose the Learnable Visual Words (LVW) to interpret the model prediction behaviors with two novel modules.
The semantic visual words learning relaxes the category-specific constraint, enabling the general visual words shared across different categories.
Our experiments on six visual benchmarks demonstrate the superior effectiveness of our proposed LVW in both accuracy and model interpretation.
arXiv Detail & Related papers (2022-05-22T03:24:45Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data.
The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.