Towards Transparent AI: A Survey on Explainable Large Language Models
- URL: http://arxiv.org/abs/2506.21812v1
- Date: Thu, 26 Jun 2025 23:25:22 GMT
- Title: Towards Transparent AI: A Survey on Explainable Large Language Models
- Authors: Avash Palikhe, Zhenyu Yu, Zichong Wang, Wenbin Zhang,
- Abstract summary: Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI)<n>LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability.<n>To overcome these limitations, researchers have developed various explainable artificial intelligence (XAI) methods.
- Score: 2.443957114877221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI). However, despite their achievements, LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability. This lack of transparency poses a significant obstacle to the adoption of LLMs in high-stakes domain applications, where interpretability is particularly essential. To overcome these limitations, researchers have developed various explainable artificial intelligence (XAI) methods that provide human-interpretable explanations for LLMs. However, a systematic understanding of these methods remains limited. To address this gap, this survey provides a comprehensive review of explainability techniques by categorizing XAI methods based on the underlying transformer architectures of LLMs: encoder-only, decoder-only, and encoder-decoder models. Then these techniques are examined in terms of their evaluation for assessing explainability, and the survey further explores how these explanations are leveraged in practical applications. Finally, it discusses available resources, ongoing research challenges, and future directions, aiming to guide continued efforts toward developing transparent and responsible LLMs.
Related papers
- Don't Just Translate, Agitate: Using Large Language Models as Devil's Advocates for AI Explanations [1.6855625805565164]
Large Language Models (LLMs) are used to translate outputs from explainability techniques, like feature-attribution weights, into a natural language explanation.<n>Recent findings suggest translating into human-like explanations does not necessarily enhance user understanding and may instead lead to overreliance on AI systems.
arXiv Detail & Related papers (2025-04-16T18:45:18Z) - LLMs for Explainable AI: A Comprehensive Survey [0.7373617024876725]
Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI)<n>LLMs transform complex machine learning outputs into easy-to-understand narratives.<n>LLMs can bridge the gap between sophisticated model behavior and human interpretability.
arXiv Detail & Related papers (2025-03-31T18:19:41Z) - Explainable artificial intelligence (XAI): from inherent explainability to large language models [0.0]
Explainable AI (XAI) techniques facilitate the explainability or interpretability of machine learning models.<n>This paper details the advancements of explainable AI methods, from inherently interpretable models to modern approaches.<n>We review explainable AI techniques that leverage vision-language model (VLM) frameworks to automate or improve the explainability of other machine learning models.
arXiv Detail & Related papers (2025-01-17T06:16:57Z) - Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - LLMs for XAI: Future Directions for Explaining Explanations [50.87311607612179]
We focus on refining explanations computed using existing XAI algorithms.
Initial experiments and user study suggest that LLMs offer a promising way to enhance the interpretability and usability of XAI.
arXiv Detail & Related papers (2024-05-09T19:17:47Z) - Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era [76.12435556140515]
XAI is being extended toward explaining Large Language Models (LLMs)<n>This paper analyzes how XAI can explain and improve LLM-based AI systems.<n>We introduce 10 strategies, introducing the key techniques for each and discussing their associated challenges.
arXiv Detail & Related papers (2024-03-13T20:25:27Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.