Dodrio: Exploring Transformer Models with Interactive Visualization
- URL: http://arxiv.org/abs/2103.14625v1
- Date: Fri, 26 Mar 2021 17:39:37 GMT
- Title: Dodrio: Exploring Transformer Models with Interactive Visualization
- Authors: Zijie J. Wang, Robert Turko, Duen Horng Chau
- Abstract summary: Dodrio is an open-source interactive visualization tool to help NLP researchers and practitioners analyze attention mechanisms in transformer-based models with linguistic knowledge.
To facilitate the visual comparison of attention weights and linguistic knowledge, Dodrio applies different graph visualization techniques to represent attention weights with longer input text.
- Score: 10.603327364971559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Why do large pre-trained transformer-based models perform so well across a
wide variety of NLP tasks? Recent research suggests the key may lie in
multi-headed attention mechanism's ability to learn and represent linguistic
information. Understanding how these models represent both syntactic and
semantic knowledge is vital to investigate why they succeed and fail, what they
have learned, and how they can improve. We present Dodrio, an open-source
interactive visualization tool to help NLP researchers and practitioners
analyze attention mechanisms in transformer-based models with linguistic
knowledge. Dodrio tightly integrates an overview that summarizes the roles of
different attention heads, and detailed views that help users compare attention
weights with the syntactic structure and semantic information in the input
text. To facilitate the visual comparison of attention weights and linguistic
knowledge, Dodrio applies different graph visualization techniques to represent
attention weights with longer input text. Case studies highlight how Dodrio
provides insights into understanding the attention mechanism in
transformer-based models. Dodrio is available at
https://poloclub.github.io/dodrio/.
Related papers
- LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Attention Visualizer Package: Revealing Word Importance for Deeper
Insight into Encoder-Only Transformer Models [0.0]
This report introduces the Attention Visualizer package.
It is crafted to visually illustrate the significance of individual words in encoder-only transformer-based models.
arXiv Detail & Related papers (2023-08-28T19:11:52Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - VISIT: Visualizing and Interpreting the Semantic Information Flow of
Transformers [45.42482446288144]
Recent advances in interpretability suggest we can project weights and hidden states of transformer-based language models to their vocabulary.
We investigate LM attention heads and memory values, the vectors the models dynamically create and recall while processing a given input.
We create a tool to visualize a forward pass of Generative Pre-trained Transformers (GPTs) as an interactive flow graph.
arXiv Detail & Related papers (2023-05-22T19:04:56Z) - AttentionViz: A Global View of Transformer Attention [60.82904477362676]
We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers.
The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention.
We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
arXiv Detail & Related papers (2023-05-04T23:46:49Z) - Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks.
We introduce a framework for language-driven representation learning from human videos and captions.
We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z) - Attention Flows: Analyzing and Comparing Attention Mechanisms in
Language Models [5.866941279460248]
We propose a visual analytics approach to understanding fine-tuning in attention-based language models.
Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models.
arXiv Detail & Related papers (2020-09-03T19:56:30Z) - Adaptive Transformers for Learning Multimodal Representations [6.09170287691728]
We extend adaptive approaches to learn more about model interpretability and computational efficiency.
We study attention spans, sparse, and structured dropout methods to help understand how their attention mechanism extends for vision and language tasks.
arXiv Detail & Related papers (2020-05-15T12:12:57Z) - Behind the Scene: Revealing the Secrets of Pre-trained
Vision-and-Language Models [65.19308052012858]
Recent Transformer-based large-scale pre-trained models have revolutionized vision-and-language (V+L) research.
We present VALUE, a set of meticulously designed probing tasks to decipher the inner workings of multimodal pre-training.
Key observations: Pre-trained models exhibit a propensity for attending over text rather than images during inference.
arXiv Detail & Related papers (2020-05-15T01:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.