Related papers: A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models

A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models

URL: http://arxiv.org/abs/2402.04787v1
Date: Wed, 7 Feb 2024 12:26:12 GMT
Title: A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models
Authors: Marc Braun, Jenny Kunz
Abstract summary: We use a Bayesian network to implement a hypothesis about how a task is solved. The resulting models do not exhibit a strong similarity to GPT-3.5. We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.
Score: 0.8702432681310401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The self-rationalising capabilities of LLMs are appealing because the generated explanations can give insights into the plausibility of the predictions. However, how faithful the explanations are to the predictions is questionable, raising the need to explore the patterns behind them further. To this end, we propose a hypothesis-driven statistical framework. We use a Bayesian network to implement a hypothesis about how a task (in our example, natural language inference) is solved, and its internal states are translated into natural language with templates. Those explanations are then compared to LLM-generated free-text explanations using automatic and human evaluations. This allows us to judge how similar the LLM's and the Bayesian network's decision processes are. We demonstrate the usage of our framework with an example hypothesis and two realisations in Bayesian networks. The resulting models do not exhibit a strong similarity to GPT-3.5. We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.

Related papers

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations [0.8949668577519213]
Large language models (LLMs) are capable of generating plausible explanations of how they arrived at an answer to a question. These explanations can misrepresent the model's "reasoning" process, i.e., they can be unfaithful. We introduce a new approach for measuring the faithfulness of LLM explanations.
arXiv Detail & Related papers (2025-04-19T02:51:20Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents. Our algorithm is modeled after the Bayesian theory-of-mind framework. We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z)
Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths [12.377041655669728]
We introduce Lachesis, a predictive model for self-consistency based LLM inferences. We empirically evaluate it using AutoFL, a recently proposed LLM-based fault localisation technique. Results suggest that Lachesis can predict the correctness of answers with a precision of up to 0.8136.
arXiv Detail & Related papers (2024-12-11T10:56:47Z)
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models [6.922021128239465]
Recent advances in AI have been driven by the capabilities of large language models (LLMs) This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms.
arXiv Detail & Related papers (2024-08-15T15:19:11Z)
Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual. Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process. We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Inference to the Best Explanation in Large Language Models [6.037970847418495]
This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features. Experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy.
arXiv Detail & Related papers (2024-02-16T15:41:23Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
ThinkSum: Probabilistic reasoning over sets using large language models [18.123895485602244]
We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner. We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks.
arXiv Detail & Related papers (2022-10-04T00:34:01Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Explaining Question Answering Models through Text Generation [42.36596190720944]
Large pre-trained language models (LMs) have been shown to perform surprisingly well when fine-tuned on tasks that require commonsense and world knowledge. It is difficult to explain what is the knowledge in the LM that allows it to make a correct prediction in end-to-end architectures. We show on several tasks that our model reaches performance that is comparable to end-to-end architectures.
arXiv Detail & Related papers (2020-04-12T09:06:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.