The Matrix: A Bayesian learning model for LLMs
- URL: http://arxiv.org/abs/2402.03175v1
- Date: Mon, 5 Feb 2024 16:42:10 GMT
- Title: The Matrix: A Bayesian learning model for LLMs
- Authors: Siddhartha Dalal and Vishal Misra
- Abstract summary: We introduce a Bayesian learning model to understand the behavior of Large Language Models (LLMs)
Our approach involves constructing an ideal generative text model represented by a multinomial transition probability matrix with a prior.
We discuss the continuity of the mapping between embeddings and multinomial distributions, and present the Dirichlet approximation theorem to approximate any prior.
- Score: 1.169389391551085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a Bayesian learning model to understand the
behavior of Large Language Models (LLMs). We explore the optimization metric of
LLMs, which is based on predicting the next token, and develop a novel model
grounded in this principle. Our approach involves constructing an ideal
generative text model represented by a multinomial transition probability
matrix with a prior, and we examine how LLMs approximate this matrix. We
discuss the continuity of the mapping between embeddings and multinomial
distributions, and present the Dirichlet approximation theorem to approximate
any prior. Additionally, we demonstrate how text generation by LLMs aligns with
Bayesian learning principles and delve into the implications for in-context
learning, specifically explaining why in-context learning emerges in larger
models where prompts are considered as samples to be updated. Our findings
indicate that the behavior of LLMs is consistent with Bayesian Learning,
offering new insights into their functioning and potential applications.
Related papers
- LLMs for XAI: Future Directions for Explaining Explanations [50.87311607612179]
We focus on refining explanations computed using existing XAI algorithms.
Initial experiments and user study suggest that LLMs offer a promising way to enhance the interpretability and usability of XAI.
arXiv Detail & Related papers (2024-05-09T19:17:47Z) - Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing.
We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing.
While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Deep de Finetti: Recovering Topic Distributions from Large Language
Models [10.151434138893034]
Large language models (LLMs) can produce long, coherent passages of text.
LLMs must represent the latent structure that characterizes a document.
We investigate a complementary aspect, namely the document's topic structure.
arXiv Detail & Related papers (2023-12-21T16:44:39Z) - In-Context Explainers: Harnessing LLMs for Explaining Black Box Models [28.396104334980492]
Large Language Models (LLMs) have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding.
One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in the prompt.
We propose a novel framework, In-Context Explainers, comprising of three novel approaches that exploit the ICL capabilities of LLMs to explain the predictions made by other predictive models.
arXiv Detail & Related papers (2023-10-09T15:31:03Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.