Related papers: Explaining Black-box Language Models with Knowledge Probing Systems: A Post-hoc Explanation Perspective

Explaining Black-box Language Models with Knowledge Probing Systems: A Post-hoc Explanation Perspective

URL: http://arxiv.org/abs/2508.16969v1
Date: Sat, 23 Aug 2025 09:41:59 GMT
Title: Explaining Black-box Language Models with Knowledge Probing Systems: A Post-hoc Explanation Perspective
Authors: Yunxiao Zhao, Hao Xu, Zhiqiang Wang, Xiaoli Li, Jiye Liang, Ru Li,
Abstract summary: Pre-trained Language Models (PLMs) are trained on large amounts of unlabeled data, yet they exhibit remarkable reasoning skills.<n>This paper proposes a novel Knowledge-guided Probing approach called KnowProb in a post-hoc explanation way.
Score: 43.267605279424686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained Language Models (PLMs) are trained on large amounts of unlabeled data, yet they exhibit remarkable reasoning skills. However, the trustworthiness challenges posed by these black-box models have become increasingly evident in recent years. To alleviate this problem, this paper proposes a novel Knowledge-guided Probing approach called KnowProb in a post-hoc explanation way, which aims to probe whether black-box PLMs understand implicit knowledge beyond the given text, rather than focusing only on the surface level content of the text. We provide six potential explanations derived from the underlying content of the given text, including three knowledge-based understanding and three association-based reasoning. In experiments, we validate that current small-scale (or large-scale) PLMs only learn a single distribution of representation, and still face significant challenges in capturing the hidden knowledge behind a given text. Furthermore, we demonstrate that our proposed approach is effective for identifying the limitations of existing black-box models from multiple probing perspectives, which facilitates researchers to promote the study of detecting black-box models in an explainable way.

Related papers

Investigating the Duality of Interpretability and Explainability in Machine Learning [2.8311451575532156]
Complex "black box" models exhibit exceptional predictive performance.<n>Their inherently opaque nature raises concerns about transparency and interpretability.<n>Efforts are focused on explaining these models instead of developing ones that are inherently interpretable.
arXiv Detail & Related papers (2025-03-27T10:48:40Z)
Knowledge Boundary of Large Language Models: A Survey [75.67848187449418]
Large language models (LLMs) store vast amount of knowledge in their parameters, but they still have limitations in the memorization and utilization of certain knowledge.<n>This highlights the critical need to understand the knowledge boundary of LLMs, a concept that remains inadequately defined in existing research.<n>We propose a comprehensive definition of the LLM knowledge boundary and introduce a formalized taxonomy categorizing knowledge into four distinct types.
arXiv Detail & Related papers (2024-12-17T02:14:02Z)
Explainable Few-shot Knowledge Tracing [48.877979333221326]
We propose a cognition-guided framework that can track the student knowledge from a few student records while providing natural language explanations. Experimental results from three widely used datasets show that LLMs can perform comparable or superior to competitive deep knowledge tracing methods.
arXiv Detail & Related papers (2024-05-23T10:07:21Z)
Does It Make Sense to Explain a Black Box With Another Black Box? [5.377278489623063]
Two main families of counterfactual explanation methods in the literature, namely, (a) emphtransparent methods that perturb the target by adding, removing, or replacing words, and (b) emphopaque approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain.
arXiv Detail & Related papers (2024-04-23T11:40:30Z)
A Survey of Explainable Knowledge Tracing [14.472784840283099]
This paper thoroughly analyzes the interpretability of KT algorithms. Current evaluation methods for explainable knowledge tracing are lacking. This paper offers some insights into evaluation methods from the perspective of educational stakeholders.
arXiv Detail & Related papers (2024-03-12T03:17:59Z)
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models [121.83413400686139]
This paper proposes to improve the visual perception ability of MLLMs through a mixture-of-experts knowledge enhancement mechanism. We introduce a novel method that incorporates multi-task encoders and visual tools into the existing MLLMs training and inference pipeline.
arXiv Detail & Related papers (2024-01-06T02:02:34Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Towards LLM-guided Causal Explainability for Black-box Text Classifiers [16.36602400590088]
We aim to leverage the instruction-following and textual understanding capabilities of recent Large Language Models to facilitate causal explainability. We propose a three-step pipeline via which, we use an off-the-shelf LLM to identify the latent or unobserved features in the input text. We experiment with our pipeline on multiple NLP text classification datasets, and present interesting and promising findings.
arXiv Detail & Related papers (2023-09-23T11:22:28Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
Interpretation of Black Box NLP Models: A Survey [0.0]
Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built. We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
arXiv Detail & Related papers (2022-03-31T14:54:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.