Related papers: Prediction is not Explanation: Revisiting the Explanatory Capacity of Mapping Embeddings

Prediction is not Explanation: Revisiting the Explanatory Capacity of Mapping Embeddings

URL: http://arxiv.org/abs/2508.13729v1
Date: Tue, 19 Aug 2025 11:00:47 GMT
Title: Prediction is not Explanation: Revisiting the Explanatory Capacity of Mapping Embeddings
Authors: Hanna Herasimchyk, Alhassan Abdelhalim, Sören Laue, Michaela Regneri,
Abstract summary: This paper examines common methods to explain the knowledge encoded in word embeddings.<n>Prediction accuracy alone does not reliably indicate genuine feature-based interpretability.
Score: 6.291731291478243
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Understanding what knowledge is implicitly encoded in deep learning models is essential for improving the interpretability of AI systems. This paper examines common methods to explain the knowledge encoded in word embeddings, which are core elements of large language models (LLMs). These methods typically involve mapping embeddings onto collections of human-interpretable semantic features, known as feature norms. Prior work assumes that accurately predicting these semantic features from the word embeddings implies that the embeddings contain the corresponding knowledge. We challenge this assumption by demonstrating that prediction accuracy alone does not reliably indicate genuine feature-based interpretability. We show that these methods can successfully predict even random information, concluding that the results are predominantly determined by an algorithmic upper bound rather than meaningful semantic representation in the word embeddings. Consequently, comparisons between datasets based solely on prediction performance do not reliably indicate which dataset is better captured by the word embeddings. Our analysis illustrates that such mappings primarily reflect geometric similarity within vector spaces rather than indicating the genuine emergence of semantic properties.

Related papers

Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes [60.75226150503949]
We propose a Bayesian framework that quantifies semantic uncertainty by analyzing the geometric structure of answer embeddings.<n>S GPU maps generated answers into a dense semantic space, computes the Gram matrix of their semantic embeddings, and summarizes their semantic configuration.<n>We show that S GPU transfers across models and modalities, indicating that its spectral representation captures general patterns of semantic uncertainty.
arXiv Detail & Related papers (2025-12-16T08:15:24Z)
Semantic and Structural Analysis of Implicit Biases in Large Language Models: An Interpretable Approach [1.5749416770494704]
It proposes an interpretable bias detection method aimed at identifying hidden social biases in model outputs.<n>The method combines nested semantic representation with a contextual contrast mechanism.<n>The evaluation focuses on several key metrics, such as bias detection accuracy, semantic consistency, and contextual sensitivity.
arXiv Detail & Related papers (2025-08-08T09:21:10Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability [1.8274323268621635]
Real Explainer (RealExp) is an interpretability method that decouples the Shapley Value into individual feature importance and feature correlation importance.<n>RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions.
arXiv Detail & Related papers (2024-12-02T10:50:50Z)
kNN Classification of Malware Data Dependency Graph Features [0.0]
This study obtains accurate classification from the use of features tied to structure and semantics. By training an accurate model using labeled data, this feature representation of semantics is shown to be correlated with ground truth labels. Our results provide evidence that data dependency graphs accurately capture both semantic and structural information for increased explainability in classification results.
arXiv Detail & Related papers (2024-06-04T16:39:02Z)
Pre-training and Diagnosing Knowledge Base Completion Models [58.07183284468881]
We introduce and analyze an approach to knowledge transfer from one collection of facts to another without the need for entity or relation matching. The main contribution is a method that can make use of large-scale pre-training on facts, which were collected from unstructured text. To understand the obtained pre-trained models better, we then introduce a novel dataset for the analysis of pre-trained models for Open Knowledge Base Completion.
arXiv Detail & Related papers (2024-01-27T15:20:43Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods [6.018950511093273]
Saliency maps can explain a neural model's predictions by identifying important input features. We formalize the underexplored task of translating saliency maps into natural language. We compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations.
arXiv Detail & Related papers (2022-10-13T17:48:15Z)
Robust Semantic Interpretability: Revisiting Concept Activation Vectors [0.0]
Interpretability methods for image classification attempt to expose whether the model is systematically biased or attending to the same cues as a human would. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole.
arXiv Detail & Related papers (2021-04-06T20:14:59Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Predicting What You Already Know Helps: Provable Self-Supervised Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data. We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation. We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.