Related papers: Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models

Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models

URL: http://arxiv.org/abs/2506.08427v1
Date: Tue, 10 Jun 2025 04:03:02 GMT
Title: Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models
Authors: Jiaxiang Liu, Boxuan Xing, Chenhao Yuan, Chenxiang Zhang, Di Wu, Xiusheng Huang, Haida Yu, Chuhan Lang, Pengfei Cao, Jun Zhao, Kang Liu,
Abstract summary: We present an open-source Knowledge Mechanisms Revealer&Interpreter (Know-MRI) designed to analyze the knowledge mechanisms within large language models (LLMs) systematically.<n>Specifically, we have developed an core module that can automatically match different input data with interpretation methods and consolidate the interpreting outputs.
Score: 17.316882613263818
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As large language models (LLMs) continue to advance, there is a growing urgency to enhance the interpretability of their internal knowledge mechanisms. Consequently, many interpretation methods have emerged, aiming to unravel the knowledge mechanisms of LLMs from various perspectives. However, current interpretation methods differ in input data formats and interpreting outputs. The tools integrating these methods are only capable of supporting tasks with specific inputs, significantly constraining their practical applications. To address these challenges, we present an open-source Knowledge Mechanisms Revealer&Interpreter (Know-MRI) designed to analyze the knowledge mechanisms within LLMs systematically. Specifically, we have developed an extensible core module that can automatically match different input data with interpretation methods and consolidate the interpreting outputs. It enables users to freely choose appropriate interpretation methods based on the inputs, making it easier to comprehensively diagnose the model's internal knowledge mechanisms from multiple perspectives. Our code is available at https://github.com/nlpkeg/Know-MRI. We also provide a demonstration video on https://youtu.be/NVWZABJ43Bs.

Related papers

Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment [0.0]
Large language models (LLMs) increasingly demonstrate signs of conceptual understanding.<n>Much of their internal knowledge remains latent, loosely structured, and difficult to access or evaluate.<n>We propose self-questioning as a lightweight and scalable strategy to improve LLMs' understanding.
arXiv Detail & Related papers (2025-05-18T15:04:02Z)
Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation [77.10390725623125]
retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope.<n>Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility.<n>We present a systematic investigation of the intrinsic mechanisms by which RAGs integrate internal (parametric) and external (retrieved) knowledge.
arXiv Detail & Related papers (2025-05-17T13:13:13Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders [30.36521888592164]
Large Language Models (LLMs) demonstrate remarkable multilingual capabilities and broad knowledge.<n>We analyze how the information encoded in LLMs' internal representations evolves during the training process.
arXiv Detail & Related papers (2025-03-09T02:13:44Z)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models [40.67240575271987]
Large Language Models (LLMs) have transformed natural language processing, yet their internal mechanisms remain largely opaque.<n> mechanistic interpretability has attracted significant attention from the research community as a means to understand the inner workings of LLMs.<n>Sparse Autoencoders (SAEs) have emerged as a promising method due to their ability to disentangle the complex, superimposed features within LLMs into more interpretable components.
arXiv Detail & Related papers (2025-03-07T17:38:00Z)
Do Large Language Models Know How Much They Know? [18.566430365358556]
Large Language Models (LLMs) have emerged as highly capable systems.<n>A desired attribute of an intelligent system is its ability to recognize the scope of its own knowledge.<n>This benchmark evaluates whether the models recall excessive, insufficient, or the precise amount of information.
arXiv Detail & Related papers (2025-02-26T21:33:06Z)
Explainable artificial intelligence (XAI): from inherent explainability to large language models [0.0]
Explainable AI (XAI) techniques facilitate the explainability or interpretability of machine learning models.<n>This paper details the advancements of explainable AI methods, from inherently interpretable models to modern approaches.<n>We review explainable AI techniques that leverage vision-language model (VLM) frameworks to automate or improve the explainability of other machine learning models.
arXiv Detail & Related papers (2025-01-17T06:16:57Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
Interpretable and Explainable Machine Learning Methods for Predictive Process Monitoring: A Systematic Literature Review [1.3812010983144802]
This paper presents a systematic review on the explainability and interpretability of machine learning (ML) models within the context of predictive process mining. We provide a comprehensive overview of the current methodologies and their applications across various application domains. Our findings aim to equip researchers and practitioners with a deeper understanding of how to develop and implement more trustworthy, transparent, and effective intelligent systems for process analytics.
arXiv Detail & Related papers (2023-12-29T12:43:43Z)
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning [50.40151403246205]
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. We define a flexible architecture with multiple neural models, complemented by discrete knowledge and reasoning modules. We describe this neuro-symbolic architecture, dubbed the Modular Reasoning, Knowledge and Language (MRKL) system.
arXiv Detail & Related papers (2022-05-01T11:01:28Z)
Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms. It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z)
A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented. This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.