Related papers: Clarifying Model Transparency: Interpretability versus Explainability in Deep Learning with MNIST and IMDB Examples

Clarifying Model Transparency: Interpretability versus Explainability in Deep Learning with MNIST and IMDB Examples

URL: http://arxiv.org/abs/2509.10929v1
Date: Sat, 13 Sep 2025 18:06:55 GMT
Title: Clarifying Model Transparency: Interpretability versus Explainability in Deep Learning with MNIST and IMDB Examples
Authors: Mitali Raj,
Abstract summary: Document offers a comparative exploration of interpretability and explainability within the deep learning paradigm.<n>We substantiate a key argument: interpretability pertains to a model's inherent capacity for human comprehension of its operational mechanisms.<n>For example, feature attribution methods can reveal why a specific MNIST image is recognized as a '7', and word-level importance can clarify an IMDB sentiment outcome.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The impressive capabilities of deep learning models are often counterbalanced by their inherent opacity, commonly termed the "black box" problem, which impedes their widespread acceptance in high-trust domains. In response, the intersecting disciplines of interpretability and explainability, collectively falling under the Explainable AI (XAI) umbrella, have become focal points of research. Although these terms are frequently used as synonyms, they carry distinct conceptual weights. This document offers a comparative exploration of interpretability and explainability within the deep learning paradigm, carefully outlining their respective definitions, objectives, prevalent methodologies, and inherent difficulties. Through illustrative examinations of the MNIST digit classification task and IMDB sentiment analysis, we substantiate a key argument: interpretability generally pertains to a model's inherent capacity for human comprehension of its operational mechanisms (global understanding), whereas explainability is more commonly associated with post-hoc techniques designed to illuminate the basis for a model's individual predictions or behaviors (local explanations). For example, feature attribution methods can reveal why a specific MNIST image is recognized as a '7', and word-level importance can clarify an IMDB sentiment outcome. However, these local insights do not render the complex underlying model globally transparent. A clear grasp of this differentiation, as demonstrated by these standard datasets, is vital for fostering dependable and sound artificial intelligence.

Related papers

How do Transformers Learn Implicit Reasoning? [67.02072851088637]
We study how implicit multi-hop reasoning emerges by training transformers from scratch in a controlled symbolic environment.<n>We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures.
arXiv Detail & Related papers (2025-05-29T17:02:49Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Data Science Principles for Interpretable and Explainable AI [0.7581664835990121]
Interpretable and interactive machine learning aims to make complex models more transparent and controllable. This review synthesizes key principles from the growing literature in this field.
arXiv Detail & Related papers (2024-05-17T05:32:27Z)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z)
From Understanding to Utilization: A Survey on Explainability for Large Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs) Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Relate to Predict: Towards Task-Independent Knowledge Representations for Reinforcement Learning [11.245432408899092]
Reinforcement Learning can enable agents to learn complex tasks. It is difficult to interpret the knowledge and reuse it across tasks. In this paper, we introduce an inductive bias for explicit object-centered knowledge separation. We show that the degree of explicitness in knowledge separation correlates with faster learning, better accuracy, better generalization, and better interpretability.
arXiv Detail & Related papers (2022-12-10T13:33:56Z)
Interpretable Representations in Explainable AI: From Theory to Practice [7.031336702345381]
Interpretable representations are the backbone of many explainers that target black-box predictive systems. We study properties of interpretable representations that encode presence and absence of human-comprehensible concepts.
arXiv Detail & Related papers (2020-08-16T21:44:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.