Can LLMs facilitate interpretation of pre-trained language models?
- URL: http://arxiv.org/abs/2305.13386v2
- Date: Fri, 20 Oct 2023 13:24:00 GMT
- Title: Can LLMs facilitate interpretation of pre-trained language models?
- Authors: Basel Mousi, Nadir Durrani, Fahim Dalvi
- Abstract summary: We propose using a large language model, ChatGPT, as an annotator to enable fine-grained interpretation analysis of pre-trained language models.
We discover latent concepts within pre-trained language models by applying agglomerative hierarchical clustering over contextualized representations.
Our findings demonstrate that ChatGPT produces accurate and semantically richer annotations compared to human-annotated concepts.
- Score: 18.77022630961142
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Work done to uncover the knowledge encoded within pre-trained language models
rely on annotated corpora or human-in-the-loop methods. However, these
approaches are limited in terms of scalability and the scope of interpretation.
We propose using a large language model, ChatGPT, as an annotator to enable
fine-grained interpretation analysis of pre-trained language models. We
discover latent concepts within pre-trained language models by applying
agglomerative hierarchical clustering over contextualized representations and
then annotate these concepts using ChatGPT. Our findings demonstrate that
ChatGPT produces accurate and semantically richer annotations compared to
human-annotated concepts. Additionally, we showcase how GPT-based annotations
empower interpretation analysis methodologies of which we demonstrate two:
probing frameworks and neuron interpretation. To facilitate further exploration
and experimentation in the field, we make available a substantial ConceptNet
dataset (TCN) comprising 39,000 annotated concepts.
Related papers
- Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs)
We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way.
ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
arXiv Detail & Related papers (2024-06-20T14:04:53Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Describe me an Aucklet: Generating Grounded Perceptual Category
Descriptions [2.7195102129095003]
We introduce a framework for testing category-level perceptual grounding in multi-modal language models.
We train separate neural networks to generate and interpret descriptions of visual categories.
We show that communicative success exposes performance issues in the generation model.
arXiv Detail & Related papers (2023-03-07T17:01:25Z) - ConceptX: A Framework for Latent Concept Analysis [21.760620298330235]
We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in Language Models (pLMs)
We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts.
arXiv Detail & Related papers (2022-11-12T11:31:09Z) - Learnable Visual Words for Interpretable Image Recognition [70.85686267987744]
We propose the Learnable Visual Words (LVW) to interpret the model prediction behaviors with two novel modules.
The semantic visual words learning relaxes the category-specific constraint, enabling the general visual words shared across different categories.
Our experiments on six visual benchmarks demonstrate the superior effectiveness of our proposed LVW in both accuracy and model interpretation.
arXiv Detail & Related papers (2022-05-22T03:24:45Z) - A Survey of Knowledge Enhanced Pre-trained Models [28.160826399552462]
We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs)
These models demonstrate deep understanding and logical reasoning and introduce interpretability.
arXiv Detail & Related papers (2021-10-01T08:51:58Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.