Related papers: DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

URL: http://arxiv.org/abs/2310.01870v2
Date: Tue, 28 Nov 2023 19:26:33 GMT
Title: DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Authors: Albert Garde, Esben Kran, Fazl Barez
Abstract summary: DeepDecipher is an API and interface for probing neurons in transformer models' layers. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior.
Score: 2.992602379681373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques for LLMs readily available. The easy-to-use interface also makes inspecting these complex models more intuitive. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe. Researchers, engineers, and developers can quickly diagnose issues, audit systems, and advance the field.

Related papers

Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models [17.316882613263818]
We present an open-source Knowledge Mechanisms Revealer&Interpreter (Know-MRI) designed to analyze the knowledge mechanisms within large language models (LLMs) systematically.<n>Specifically, we have developed an core module that can automatically match different input data with interpretation methods and consolidate the interpreting outputs.
arXiv Detail & Related papers (2025-06-10T04:03:02Z)
LLMs for Explainable AI: A Comprehensive Survey [0.7373617024876725]
Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI) LLMs transform complex machine learning outputs into easy-to-understand narratives. LLMs can bridge the gap between sophisticated model behavior and human interpretability.
arXiv Detail & Related papers (2025-03-31T18:19:41Z)
LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models [6.002736042809241]
LogitLens4LLMs is a toolkit that extends the Logit Lens technique to modern large language models. Our work overcomes the limitations of existing implementations, enabling the technique to be applied to state-of-the-art architectures.
arXiv Detail & Related papers (2025-02-24T03:37:44Z)
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models [1.747623282473278]
Deep learning models operate as opaque 'black boxes' with limited transparency in their decision-making processes. This study addresses the pressing need for interpretability in AI systems, emphasizing its role in fostering trust, ensuring accountability, and promoting responsible deployment in mission-critical fields. We introduce DLBacktrace, an innovative technique developed by the AryaXAI team to illuminate model decisions across a wide array of domains.
arXiv Detail & Related papers (2024-11-19T16:54:30Z)
Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed. We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer. We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics [46.99625341531352]
DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. We investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection.
arXiv Detail & Related papers (2024-03-21T01:57:30Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP [27.51318030253248]
We adapt a unimodal causal tracing tool to BLIP to enable the study of the neural mechanisms underlying image-conditioned text generation. We release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability.
arXiv Detail & Related papers (2023-08-27T18:46:47Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. We refer to them as Augmented Language Models (ALMs) The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.