Towards Uncovering How Large Language Model Works: An Explainability Perspective
- URL: http://arxiv.org/abs/2402.10688v2
- Date: Mon, 15 Apr 2024 18:43:12 GMT
- Title: Towards Uncovering How Large Language Model Works: An Explainability Perspective
- Authors: Haiyan Zhao, Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du,
- Abstract summary: Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque.
This paper aims to uncover the mechanisms underlying LLM functionality through the lens of explainability.
- Score: 38.07611356855978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque. This lack of transparency presents challenges such as hallucinations, toxicity, and misalignment with human values, hindering the safe and beneficial deployment of LLMs. This paper aims to uncover the mechanisms underlying LLM functionality through the lens of explainability. First, we review how knowledge is architecturally composed within LLMs and encoded in their internal parameters via mechanistic interpretability techniques. Then, we summarize how knowledge is embedded in LLM representations by leveraging probing techniques and representation engineering. Additionally, we investigate the training dynamics through a mechanistic perspective to explain phenomena such as grokking and memorization. Lastly, we explore how the insights gained from these explanations can enhance LLM performance through model editing, improve efficiency through pruning, and better align with human values.
Related papers
- Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs [12.48241058167222]
Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions.
But studies reveal that they often struggle with tasks requiring reasoning, such as math or physics limitation.
This raises questions about whether LLMs truly comprehend embedded knowledge or merely learn to replicate the token distribution without a true understanding of the content.
We propose Decon Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model's reasoning capabilities.
arXiv Detail & Related papers (2024-09-04T13:17:09Z) - Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study [10.051572826948762]
Large models (LLMs) have shown significant achievements in solving a wide range of tasks.
We empirically analyze the LLMs' capability of understanding Description Logic (DL-Lite)
We find that LLMs understand formal syntax and model-theoretic semantics of concepts and roles.
arXiv Detail & Related papers (2024-06-25T13:16:34Z) - Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales [102.54274021830207]
We introduce Fact, a novel paradigm designed to generate multimodal rationales that are faithful, concise, and transferable for teaching MLLMs.
We filter rationales that can be transferred to end-to-end paradigms from programming paradigms to guarantee transferability.
Our approach also reduces hallucinations owing to its high correlation between images and text.
arXiv Detail & Related papers (2024-04-17T07:20:56Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.