Deep de Finetti: Recovering Topic Distributions from Large Language
Models
- URL: http://arxiv.org/abs/2312.14226v1
- Date: Thu, 21 Dec 2023 16:44:39 GMT
- Title: Deep de Finetti: Recovering Topic Distributions from Large Language
Models
- Authors: Liyi Zhang, R. Thomas McCoy, Theodore R. Sumers, Jian-Qiao Zhu, Thomas
L. Griffiths
- Abstract summary: Large language models (LLMs) can produce long, coherent passages of text.
LLMs must represent the latent structure that characterizes a document.
We investigate a complementary aspect, namely the document's topic structure.
- Score: 10.151434138893034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) can produce long, coherent passages of text,
suggesting that LLMs, although trained on next-word prediction, must represent
the latent structure that characterizes a document. Prior work has found that
internal representations of LLMs encode one aspect of latent structure, namely
syntax; here we investigate a complementary aspect, namely the document's topic
structure. We motivate the hypothesis that LLMs capture topic structure by
connecting LLM optimization to implicit Bayesian inference. De Finetti's
theorem shows that exchangeable probability distributions can be represented as
a mixture with respect to a latent generating distribution. Although text is
not exchangeable at the level of syntax, exchangeability is a reasonable
starting assumption for topic structure. We thus hypothesize that predicting
the next token in text will lead LLMs to recover latent topic distributions. We
examine this hypothesis using Latent Dirichlet Allocation (LDA), an
exchangeable probabilistic topic model, as a target, and we show that the
representations formed by LLMs encode both the topics used to generate
synthetic data and those used to explain natural corpus data.
Related papers
- Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations [48.07182711678573]
ASTrust generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages.
We develop an automated visualization that illustrates the aggregated model confidence scores superimposed on sequence, heat-map, and graph-based visuals of syntactic structures from ASTs.
arXiv Detail & Related papers (2024-07-12T04:38:28Z) - Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias.
We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics.
We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential.
We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z) - A Hypothesis-Driven Framework for the Analysis of Self-Rationalising
Models [0.8702432681310401]
We use a Bayesian network to implement a hypothesis about how a task is solved.
The resulting models do not exhibit a strong similarity to GPT-3.5.
We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.
arXiv Detail & Related papers (2024-02-07T12:26:12Z) - The Matrix: A Bayesian learning model for LLMs [1.169389391551085]
We introduce a Bayesian learning model to understand the behavior of Large Language Models (LLMs)
Our approach involves constructing an ideal generative text model represented by a multinomial transition probability matrix with a prior.
We discuss the continuity of the mapping between embeddings and multinomial distributions, and present the Dirichlet approximation theorem to approximate any prior.
arXiv Detail & Related papers (2024-02-05T16:42:10Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - ThinkSum: Probabilistic reasoning over sets using large language models [18.123895485602244]
We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner.
We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks.
arXiv Detail & Related papers (2022-10-04T00:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.