Related papers: LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

URL: http://arxiv.org/abs/2404.07004v1
Date: Wed, 10 Apr 2024 13:39:11 GMT
Title: LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
Authors: Igor Tufanov, Karen Hambardzumyan, Javier Ferrando, Elena Voita,
Abstract summary: LM Transparency Tool (LM-TT) is an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. It shows the important part of the whole input-to-output information flow.
Score: 10.452149013566157
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Differently from previously existing tools that focus on isolated parts of the decision-making process, our framework is designed to make the entire prediction process transparent, and allows tracing back model behavior from the top-layer representation to very fine-grained parts of the model. Specifically, it (1) shows the important part of the whole input-to-output information flow, (2) allows attributing any changes done by a model block to individual attention heads and feed-forward neurons, (3) allows interpreting the functions of those heads or neurons. A crucial part of this pipeline is showing the importance of specific model components at each step. As a result, we are able to look at the roles of model components only in cases where they are important for a prediction. Since knowing which components should be inspected is key for analyzing large models where the number of these components is extremely high, we believe our tool will greatly support the interpretability community both in research settings and in practical applications.

Related papers

InTraVisTo: Inside Transformer Visualisation Tool [0.19573380763700712]
In this paper, we introduce a new tool, InTraVisTo, designed to enable researchers to investigate and trace the computational process that generates each token in a Transformer-based LLM.<n>InTraVisTo provides a visualization of both the internal state of the Transformer model and the information flow between the various components across the different layers of the model.
arXiv Detail & Related papers (2025-07-18T12:23:47Z)
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution [77.86222359025011]
We propose ToolACE-DEV, a self-improving framework for tool learning.<n>First, we decompose the tool-learning objective into sub-tasks that enhance basic tool-making and tool-using abilities.<n>We then introduce a self-evolving paradigm that allows lightweight models to self-improve, reducing reliance on advanced LLMs.
arXiv Detail & Related papers (2025-05-12T12:48:30Z)
Towards Unifying Feature Interaction Models for Click-Through Rate Prediction [19.149554121852724]
We propose a general framework called IPA to unify existing models. We demonstrate that most existing models can be categorized within our framework by making specific choices for these three components. We introduce a novel model that achieves competitive results compared to state-of-the-art CTR models.
arXiv Detail & Related papers (2024-11-19T12:04:02Z)
CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance [17.723293304671877]
We propose a Component-based Tool-utilizing ability Injection method (CITI) According to the gradient-based importance score of different components, CITI alleviates the capability conflicts caused by fine-tuning process. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics.
arXiv Detail & Related papers (2024-09-20T04:06:28Z)
Decomposing and Editing Predictions by Modeling Model Computation [75.37535202884463]
We introduce a task called component modeling. The goal of component modeling is to decompose an ML model's prediction in terms of its components. We present COAR, a scalable algorithm for estimating component attributions.
arXiv Detail & Related papers (2024-04-17T16:28:08Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
Deciphering AutoML Ensembles: cattleia's Assistance in Decision-Making [0.0]
Cattleia is an application that deciphers the ensembles for regression, multiclass, and binary classification tasks. It works with models built by three AutoML packages: auto-sklearn, AutoGluon, and FLAML.
arXiv Detail & Related papers (2024-03-19T11:56:21Z)
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models. Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z)
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers [45.42482446288144]
Recent advances in interpretability suggest we can project weights and hidden states of transformer-based language models to their vocabulary. We investigate LM attention heads and memory values, the vectors the models dynamically create and recall while processing a given input. We create a tool to visualize a forward pass of Generative Pre-trained Transformers (GPTs) as an interactive flow graph.
arXiv Detail & Related papers (2023-05-22T19:04:56Z)
Understanding Programmatic Weak Supervision via Source-aware Influence Function [76.74549130841383]
Programmatic Weak Supervision (PWS) aggregates the source votes of multiple weak supervision sources into probabilistic training labels. We build on Influence Function (IF) to decompose the end model's training objective and then calculate the influence associated with each (data, source, class) These primitive influence score can then be used to estimate the influence of individual component PWS, such as source vote, supervision source, and training data.
arXiv Detail & Related papers (2022-05-25T15:57:24Z)
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models. It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z)
VisBERT: Hidden-State Visualizations for Transformers [66.86452388524886]
We present VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings.
arXiv Detail & Related papers (2020-11-09T15:37:43Z)
ViCE: Visual Counterfactual Explanations for Machine Learning Models [13.94542147252982]
We present an interactive visual analytics tool, ViCE, that generates counterfactual explanations to contextualize and evaluate model decisions. Results are effectively displayed in a visual interface where counterfactual explanations are highlighted and interactive methods are provided for users to explore the data and model.
arXiv Detail & Related papers (2020-03-05T04:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.