Related papers: BELL: Benchmarking the Explainability of Large Language Models

Related papers

Towards Large Language Models with Self-Consistent Natural Language Explanations [11.085839471231552]
Large language models (LLMs) seem to offer an easy path to interpretability.<n>Yet, studies show that these post-hoc explanations often misrepresent the true decision process.
arXiv Detail & Related papers (2025-06-09T08:06:33Z)
Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models? [59.970391602080205]
Despite multilingual training, LRMs tend to default to reasoning in high-resource languages at test time.<n>Cultural reasoning degrades performance on reasoning tasks but benefits cultural tasks, while safety evaluations exhibit language-specific behavior.
arXiv Detail & Related papers (2025-05-23T02:46:18Z)
LExT: Towards Evaluating Trustworthiness of Natural Language Explanations [10.77745803401336]
We propose a framework for quantifying trustworthiness of natural language explanations, balancing Plausibility and Faithfulness.<n>Applying our domain-agnostic framework to the healthcare domain using public medical datasets, we evaluate six models.<n>Our findings demonstrate significant differences in their ability to generate trustworthy explanations.
arXiv Detail & Related papers (2025-04-08T17:16:52Z)
XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions. Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge. evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
FaithLM: Towards Faithful Explanations for Large Language Models [67.29893340289779]
Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their internal knowledge and reasoning capabilities. The black-box nature of these models complicates the task of explaining their decision-making processes. We introduce FaithLM to explain the decision of LLMs with natural language (NL) explanations.
arXiv Detail & Related papers (2024-02-07T09:09:14Z)
From Understanding to Utilization: A Survey on Explainability for Large Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs) Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z)
Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models [27.841725567976315]
Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP), but their lack of interpretability has been a major concern. In this work, we introduce proto-lm, a prototypical network-based white-box framework that allows LLMs to learn immediately interpretable embeddings. Our method's applicability and interpretability are demonstrated through experiments on a wide range of NLP tasks, and our results indicate a new possibility of creating interpretable models without sacrificing performance.
arXiv Detail & Related papers (2023-11-03T05:55:32Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Large Language Models Are Not Strong Abstract Reasoners [12.354660792999269]
Large Language Models have shown tremendous performance on a variety of natural language processing tasks. It is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. We introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks.
arXiv Detail & Related papers (2023-05-31T04:50:29Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks. We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z)
Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.