A Hypothesis-Driven Framework for the Analysis of Self-Rationalising
Models
- URL: http://arxiv.org/abs/2402.04787v1
- Date: Wed, 7 Feb 2024 12:26:12 GMT
- Title: A Hypothesis-Driven Framework for the Analysis of Self-Rationalising
Models
- Authors: Marc Braun, Jenny Kunz
- Abstract summary: We use a Bayesian network to implement a hypothesis about how a task is solved.
The resulting models do not exhibit a strong similarity to GPT-3.5.
We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.
- Score: 0.8702432681310401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The self-rationalising capabilities of LLMs are appealing because the
generated explanations can give insights into the plausibility of the
predictions. However, how faithful the explanations are to the predictions is
questionable, raising the need to explore the patterns behind them further. To
this end, we propose a hypothesis-driven statistical framework. We use a
Bayesian network to implement a hypothesis about how a task (in our example,
natural language inference) is solved, and its internal states are translated
into natural language with templates. Those explanations are then compared to
LLM-generated free-text explanations using automatic and human evaluations.
This allows us to judge how similar the LLM's and the Bayesian network's
decision processes are. We demonstrate the usage of our framework with an
example hypothesis and two realisations in Bayesian networks. The resulting
models do not exhibit a strong similarity to GPT-3.5. We discuss the
implications of this as well as the framework's potential to approximate LLM
decisions better in future work.
Related papers
- Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models [6.922021128239465]
Recent advances in AI have been driven by the capabilities of large language models (LLMs)
This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms.
arXiv Detail & Related papers (2024-08-15T15:19:11Z) - Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual.
Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process.
We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Inference to the Best Explanation in Large Language Models [6.037970847418495]
This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE)
IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features.
Experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy.
arXiv Detail & Related papers (2024-02-16T15:41:23Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - ThinkSum: Probabilistic reasoning over sets using large language models [18.123895485602244]
We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner.
We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks.
arXiv Detail & Related papers (2022-10-04T00:34:01Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explaining Question Answering Models through Text Generation [42.36596190720944]
Large pre-trained language models (LMs) have been shown to perform surprisingly well when fine-tuned on tasks that require commonsense and world knowledge.
It is difficult to explain what is the knowledge in the LM that allows it to make a correct prediction in end-to-end architectures.
We show on several tasks that our model reaches performance that is comparable to end-to-end architectures.
arXiv Detail & Related papers (2020-04-12T09:06:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.