Related papers: I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering

I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering

URL: http://arxiv.org/abs/2406.02060v1
Date: Tue, 4 Jun 2024 07:43:12 GMT
Title: I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering
Authors: Valeriya Goloviznina, Evgeny Kotelnikov,
Abstract summary: This paper investigates the interpretation of large language models (LLMs) in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interpretability and explainability of AI are becoming increasingly important in light of the rapid development of large language models (LLMs). This paper investigates the interpretation of LLMs in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states. The quantized models LLaMA-2-7B-Chat, Mistral-7B, Vicuna-7B and the MuSeRC question-answering dataset are used to test this hypothesis. The results of the analysis support the proposed hypothesis. We also identify the layers which have a negative effect on the model's behavior. As a prospect of practical application of the hypothesis, we propose to train such "weak" layers additionally in order to improve the quality of the task solution.

Related papers

Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs [28.556628696390767]
Large Language Models (LLMs) demonstrate impressive reasoning capabilities.<n>Evidence suggests much of their success stems from memorized answer-reasoning patterns rather than genuine inference.<n>We propose a five-level answer-visibility prompt framework that systematically manipulates answer cues and probes model behavior through indirect, behavioral analysis.
arXiv Detail & Related papers (2025-06-21T08:15:45Z)
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations [0.8949668577519213]
Large language models (LLMs) are capable of generating plausible explanations of how they arrived at an answer to a question. These explanations can misrepresent the model's "reasoning" process, i.e., they can be unfaithful. We introduce a new approach for measuring the faithfulness of LLM explanations.
arXiv Detail & Related papers (2025-04-19T02:51:20Z)
Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing. We identify several features of LLM responses that shape users' reliance. We find that explanations increase reliance on both correct and incorrect responses. We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z)
Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs [11.805264893752154]
We evaluate the reasoning capabilities of two large language models, LLaMA 2 and Flan-T5, by manipulating facts within two compositional datasets: QASC and Bamboogle. Our findings reveal that while both models leverage (a), Flan-T5 shows more resilience to experiments, having less variance than LLaMA 2. This suggests that models may develop an understanding of transitivity through fine-tuning on knowingly relevant datasets.
arXiv Detail & Related papers (2024-10-26T15:09:07Z)
Are LLMs Aware that Some Questions are not Open-ended? [58.93124686141781]
We study whether Large Language Models are aware that some questions have limited answers and need to respond more deterministically. The lack of question awareness in LLMs leads to two phenomena: (1) too casual to answer non-open-ended questions or (2) too boring to answer open-ended questions.
arXiv Detail & Related papers (2024-10-01T06:07:00Z)
Investigating Layer Importance in Large Language Models [28.156622049937216]
Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.
arXiv Detail & Related papers (2024-09-22T09:53:13Z)
What Do Language Models Learn in Context? The Structured Task Hypothesis [89.65045443150889]
Large language models (LLMs) learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL) One popular hypothesis explains ICL by task selection. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration.
arXiv Detail & Related papers (2024-06-06T16:15:34Z)
Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z)
Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering [67.94354589215637]
Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations. In this paper, we perceive the LLMs' knowledge boundary (KB) with semi-open-ended questions (SoeQ) We find that GPT-4 performs poorly on SoeQ and is often unaware of its KB. Our auxiliary model, LLaMA-2-13B, is effective in discovering more ambiguous answers.
arXiv Detail & Related papers (2024-05-23T10:00:14Z)
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments [37.03210795084276]
We examine whether the current format of prompting Large Language Models elicits responses in a consistent and robust manner. Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability. Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions.
arXiv Detail & Related papers (2023-11-16T09:50:53Z)
Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment [82.60594940370919]
We propose the FlipFlop experiment to study the multi-turn behavior of Large Language Models (LLMs) We show that models flip their answers on average 46% of the time and that all models see a deterioration of accuracy between their first and final prediction, with an average drop of 17% (the FlipFlop effect) We conduct finetuning experiments on an open-source LLM and find that finetuning on synthetically created data can mitigate - reducing performance deterioration by 60% - but not resolve sycophantic behavior entirely.
arXiv Detail & Related papers (2023-11-14T23:40:22Z)
Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs) We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z)
Explaining Question Answering Models through Text Generation [42.36596190720944]
Large pre-trained language models (LMs) have been shown to perform surprisingly well when fine-tuned on tasks that require commonsense and world knowledge. It is difficult to explain what is the knowledge in the LM that allows it to make a correct prediction in end-to-end architectures. We show on several tasks that our model reaches performance that is comparable to end-to-end architectures.
arXiv Detail & Related papers (2020-04-12T09:06:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.