The Language Interpretability Tool: Extensible, Interactive
Visualizations and Analysis for NLP Models
- URL: http://arxiv.org/abs/2008.05122v1
- Date: Wed, 12 Aug 2020 06:07:44 GMT
- Title: The Language Interpretability Tool: Extensible, Interactive
Visualizations and Analysis for NLP Models
- Authors: Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy
Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh,
Emily Reif, Ann Yuan
- Abstract summary: Language Interpretability Tool (LIT) is an open-source platform for visualization and understanding of NLP models.
LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface.
- Score: 17.423179212411263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Language Interpretability Tool (LIT), an open-source platform
for visualization and understanding of NLP models. We focus on core questions
about model behavior: Why did my model make this prediction? When does it
perform poorly? What happens under a controlled change in the input? LIT
integrates local explanations, aggregate analysis, and counterfactual
generation into a streamlined, browser-based interface to enable rapid
exploration and error analysis. We include case studies for a diverse set of
workflows, including exploring counterfactuals for sentiment analysis,
measuring gender bias in coreference systems, and exploring local behavior in
text generation. LIT supports a wide range of models--including classification,
seq2seq, and structured prediction--and is highly extensible through a
declarative, framework-agnostic API. LIT is under active development, with code
and full documentation available at https://github.com/pair-code/lit.
Related papers
- Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - How and where does CLIP process negation? [2.5600000778964294]
We build on the existence task from the VALSE benchmark to test models' understanding of negation.
We take inspiration from the literature on model interpretability to explain the behaviour of VL models on the understanding of negation.
arXiv Detail & Related papers (2024-07-15T07:20:06Z) - Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed.
We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer.
We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.