Related papers: Towards Transparent AI: A Survey on Explainable Language Models

Towards Transparent AI: A Survey on Explainable Language Models

URL: http://arxiv.org/abs/2509.21631v1
Date: Thu, 25 Sep 2025 21:47:39 GMT
Title: Towards Transparent AI: A Survey on Explainable Language Models
Authors: Avash Palikhe, Zichong Wang, Zhipeng Yin, Rui Guo, Qiang Duan, Jie Yang, Wenbin Zhang,
Abstract summary: Language Models (LMs) have significantly advanced natural language processing and enabled remarkable progress across diverse domains.<n>Lack of transparency is particularly problematic for adoption in high-stakes domains.<n>XAI methods have been well studied for non-LMs, but they face many limitations when applied to LMs.
Score: 22.70051215800476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language Models (LMs) have significantly advanced natural language processing and enabled remarkable progress across diverse domains, yet their black-box nature raises critical concerns about the interpretability of their internal mechanisms and decision-making processes. This lack of transparency is particularly problematic for adoption in high-stakes domains, where stakeholders need to understand the rationale behind model outputs to ensure accountability. On the other hand, while explainable artificial intelligence (XAI) methods have been well studied for non-LMs, they face many limitations when applied to LMs due to their complex architectures, considerable training corpora, and broad generalization abilities. Although various surveys have examined XAI in the context of LMs, they often fail to capture the distinct challenges arising from the architectural diversity and evolving capabilities of these models. To bridge this gap, this survey presents a comprehensive review of XAI techniques with a particular emphasis on LMs, organizing them according to their underlying transformer architectures: encoder-only, decoder-only, and encoder-decoder, and analyzing how methods are adapted to each while assessing their respective strengths and limitations. Furthermore, we evaluate these techniques through the dual lenses of plausibility and faithfulness, offering a structured perspective on their effectiveness. Finally, we identify open research challenges and outline promising future directions, aiming to guide ongoing efforts toward the development of robust, transparent, and interpretable XAI methods for LMs.

Related papers

Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants [85.33837131101342]
We propose a strategic roadmap organized into four pillars: foundational infrastructure, algorithmic optimization, cognitive reasoning, and unified multimodal intelligence.<n>We argue that this transition is essential for developing next-generation AI capable of complex structural reasoning, dynamic self-correction, and seamless multimodal integration.
arXiv Detail & Related papers (2026-01-20T14:58:23Z)
Towards Transparent AI: A Survey on Explainable Large Language Models [2.443957114877221]
Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI)<n>LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability.<n>To overcome these limitations, researchers have developed various explainable artificial intelligence (XAI) methods.
arXiv Detail & Related papers (2025-06-26T23:25:22Z)
LLMs for Explainable AI: A Comprehensive Survey [0.7373617024876725]
Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI)<n>LLMs transform complex machine learning outputs into easy-to-understand narratives.<n>LLMs can bridge the gap between sophisticated model behavior and human interpretability.
arXiv Detail & Related papers (2025-03-31T18:19:41Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models [50.34089812436633]
Large Language Models (LLMs) have transformed natural language processing, yet their internal mechanisms remain largely opaque.<n> mechanistic interpretability has attracted significant attention from the research community as a means to understand the inner workings of LLMs.<n>Sparse Autoencoders (SAEs) have emerged as a promising method due to their ability to disentangle the complex, superimposed features within LLMs into more interpretable components.
arXiv Detail & Related papers (2025-03-07T17:38:00Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)<n>This paper explores potential areas where statisticians can make important contributions to the development of LLMs.<n>We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
Explainable artificial intelligence (XAI): from inherent explainability to large language models [0.0]
Explainable AI (XAI) techniques facilitate the explainability or interpretability of machine learning models.<n>This paper details the advancements of explainable AI methods, from inherently interpretable models to modern approaches.<n>We review explainable AI techniques that leverage vision-language model (VLM) frameworks to automate or improve the explainability of other machine learning models.
arXiv Detail & Related papers (2025-01-17T06:16:57Z)
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future [10.264208559276927]
This review aims to gain key insights into the development of MXAI methods.<n>We categorize MXAI methods across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs.<n>We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions.
arXiv Detail & Related papers (2024-12-18T17:06:21Z)
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey [46.617998833238126]
Large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing.<n>The convergence of these technologies has catalyzed the rise of multimodal AI, enabling richer, cross-modal understanding that spans text, vision, audio, and video modalities.<n>Multimodal large language models (MLLMs) have emerged as a powerful framework, demonstrating impressive capabilities in tasks like image-text generation, visual question answering, and cross-modal retrieval.<n>Despite these advancements, the complexity and scale of MLLMs introduce significant challenges in interpretability and explainability, essential for establishing
arXiv Detail & Related papers (2024-12-03T02:54:31Z)
Visual Error Patterns in Multi-Modal AI: A Statistical Approach [0.0]
Multi-modal large language models (MLLMs) excel at integrating text and visual data but face systematic challenges when interpreting ambiguous or incomplete visual stimuli.<n>This study leverages statistical modeling to analyze the factors driving these errors, using a dataset of geometric stimuli characterized by features like 3D, rotation, and missing face/side.
arXiv Detail & Related papers (2024-11-27T01:20:08Z)
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era [76.12435556140515]
XAI is being extended toward explaining Large Language Models (LLMs)<n>This paper analyzes how XAI can explain and improve LLM-based AI systems.<n>We introduce 10 strategies, introducing the key techniques for each and discussing their associated challenges.
arXiv Detail & Related papers (2024-03-13T20:25:27Z)
Beyond Explaining: Opportunities and Challenges of XAI-Based Model Improvement [75.00655434905417]
Explainable Artificial Intelligence (XAI) is an emerging research field bringing transparency to highly complex machine learning (ML) models. This paper offers a comprehensive overview over techniques that apply XAI practically for improving various properties of ML models. We show empirically through experiments on toy and realistic settings how explanations can help improve properties such as model generalization ability or reasoning.
arXiv Detail & Related papers (2022-03-15T15:44:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.