DeepDecipher: Accessing and Investigating Neuron Activation in Large
Language Models
- URL: http://arxiv.org/abs/2310.01870v2
- Date: Tue, 28 Nov 2023 19:26:33 GMT
- Title: DeepDecipher: Accessing and Investigating Neuron Activation in Large
Language Models
- Authors: Albert Garde, Esben Kran, Fazl Barez
- Abstract summary: DeepDecipher is an API and interface for probing neurons in transformer models' layers.
This paper outlines DeepDecipher's design and capabilities.
We demonstrate how to analyze neurons, compare models, and gain insights into model behavior.
- Score: 2.992602379681373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) become more capable, there is an urgent need
for interpretable and transparent tools. Current methods are difficult to
implement, and accessible tools to analyze model internals are lacking. To
bridge this gap, we present DeepDecipher - an API and interface for probing
neurons in transformer models' MLP layers. DeepDecipher makes the outputs of
advanced interpretability techniques for LLMs readily available. The
easy-to-use interface also makes inspecting these complex models more
intuitive. This paper outlines DeepDecipher's design and capabilities. We
demonstrate how to analyze neurons, compare models, and gain insights into
model behavior. For example, we contrast DeepDecipher's functionality with
similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher
enables efficient, scalable analysis of LLMs. By granting access to
state-of-the-art interpretability methods, DeepDecipher makes LLMs more
transparent, trustworthy, and safe. Researchers, engineers, and developers can
quickly diagnose issues, audit systems, and advance the field.
Related papers
- LLMs for Explainable AI: A Comprehensive Survey [0.7373617024876725]
Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI)
LLMs transform complex machine learning outputs into easy-to-understand narratives.
LLMs can bridge the gap between sophisticated model behavior and human interpretability.
arXiv Detail & Related papers (2025-03-31T18:19:41Z) - LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models [6.002736042809241]
LogitLens4LLMs is a toolkit that extends the Logit Lens technique to modern large language models.
Our work overcomes the limitations of existing implementations, enabling the technique to be applied to state-of-the-art architectures.
arXiv Detail & Related papers (2025-02-24T03:37:44Z) - DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models [1.747623282473278]
Deep learning models operate as opaque 'black boxes' with limited transparency in their decision-making processes.
This study addresses the pressing need for interpretability in AI systems, emphasizing its role in fostering trust, ensuring accountability, and promoting responsible deployment in mission-critical fields.
We introduce DLBacktrace, an innovative technique developed by the AryaXAI team to illuminate model decisions across a wide array of domains.
arXiv Detail & Related papers (2024-11-19T16:54:30Z) - Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed.
We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer.
We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics [46.99625341531352]
DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation.
We investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection.
arXiv Detail & Related papers (2024-03-21T01:57:30Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
Tool for BLIP [27.51318030253248]
We adapt a unimodal causal tracing tool to BLIP to enable the study of the neural mechanisms underlying image-conditioned text generation.
We release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability.
arXiv Detail & Related papers (2023-08-27T18:46:47Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.