Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
Tool for BLIP
- URL: http://arxiv.org/abs/2308.14179v1
- Date: Sun, 27 Aug 2023 18:46:47 GMT
- Title: Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
Tool for BLIP
- Authors: Vedant Palit and Rohan Pandey and Aryaman Arora and Paul Pu Liang
- Abstract summary: We adapt a unimodal causal tracing tool to BLIP to enable the study of the neural mechanisms underlying image-conditioned text generation.
We release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability.
- Score: 27.51318030253248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mechanistic interpretability seeks to understand the neural mechanisms that
enable specific behaviors in Large Language Models (LLMs) by leveraging
causality-based methods. While these approaches have identified neural circuits
that copy spans of text, capture factual knowledge, and more, they remain
unusable for multimodal models since adapting these tools to the
vision-language domain requires considerable architectural changes. In this
work, we adapt a unimodal causal tracing tool to BLIP to enable the study of
the neural mechanisms underlying image-conditioned text generation. We
demonstrate our approach on a visual question answering dataset, highlighting
the causal relevance of later layer representations for all tokens.
Furthermore, we release our BLIP causal tracing tool as open source to enable
further experimentation in vision-language mechanistic interpretability by the
community. Our code is available at
https://github.com/vedantpalit/Towards-Vision-Language-Mechanistic-Interpretability.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning [67.0609518552321]
We propose to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models.
By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner.
arXiv Detail & Related papers (2023-12-05T07:29:14Z) - Sparse Autoencoders Find Highly Interpretable Features in Language
Models [0.0]
Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally.
We use sparse autoencoders to reconstruct the internal activations of a language model.
Our method may serve as a foundation for future mechanistic interpretability work.
arXiv Detail & Related papers (2023-09-15T17:56:55Z) - Neural Authorship Attribution: Stylometric Analysis on Large Language
Models [16.63955074133222]
Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text.
With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics.
arXiv Detail & Related papers (2023-08-14T17:46:52Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
Skip-connections [104.14624185375897]
mPLUG is a new vision-language foundation model for both cross-modal understanding and generation.
It achieves state-of-the-art results on a wide range of vision-language downstream tasks, such as image captioning, image-text retrieval, visual grounding and visual question answering.
arXiv Detail & Related papers (2022-05-24T11:52:06Z) - AttViz: Online exploration of self-attention for transparent neural
language modeling [7.574392147428978]
We propose AttViz, an online toolkit for exploration of self-attention---real values associated with individual text tokens.
We show how existing deep learning pipelines can produce outputs suitable for AttViz, offering novel visualizations of the attention heads and their aggregations with minimal effort, online.
arXiv Detail & Related papers (2020-05-12T12:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.