Related papers: ThinkSum: Probabilistic reasoning over sets using large language models

ThinkSum: Probabilistic reasoning over sets using large language models

URL: http://arxiv.org/abs/2210.01293v2
Date: Fri, 2 Jun 2023 17:25:19 GMT
Title: ThinkSum: Probabilistic reasoning over sets using large language models
Authors: Batu Ozturkler, Nikolay Malkin, Zhen Wang, Nebojsa Jojic
Abstract summary: We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner. We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks.
Score: 18.123895485602244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the more advanced LLMs fail in scenarios that require reasoning over multiple objects or facts and making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner. In the first stage (Think - retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (Sum - probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks, achieving improvements over the state of the art using GPT-family models on thirteen difficult tasks, often with far smaller model variants. We also compare and contrast ThinkSum with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. Our results suggest that because the probabilistic inference in ThinkSum is performed outside of calls to the LLM, ThinkSum is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs. Overall, our proposed paradigm represents a promising approach for enhancing the reasoning capabilities of LLMs.

Related papers

Argumentation Computation with Large Language Models : A Benchmark Study [6.0682923348298194]
Large language models (LLMs) have made significant advancements in neuro-symbolic computing. We aim to investigate the capability of LLMs in determining the extensions of various abstract argumentation semantics.
arXiv Detail & Related papers (2024-12-21T18:23:06Z)
Refining Answer Distributions for Improved Large Language Model Reasoning [24.67507932821155]
We present Refined Answer Distributions, a novel and principled algorithmic framework to enhance the reasoning capabilities of Large Language Models (LLMs) Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers, with the goal of identifying the mode -- the most likely answer.
arXiv Detail & Related papers (2024-12-17T19:45:53Z)
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models [6.922021128239465]
Recent advances in AI have been driven by the capabilities of large language models (LLMs) This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms.
arXiv Detail & Related papers (2024-08-15T15:19:11Z)
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification [34.9210323553677]
We introduce LAMPO, a novel paradigm that leverages Large Language Models (LLMs) for solving few-shot multi-class ordinal classification tasks. Extensive experiments on seven public datasets demonstrate LAMPO's remarkably competitive performance across a diverse spectrum of applications.
arXiv Detail & Related papers (2024-08-06T15:55:05Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Probabilistic Reasoning in Generative Large Language Models [18.983753573277596]
This paper considers the challenges Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. We introduce the Bayesian Linguistic Inference dataset (BLInD), a new dataset designed to test the probabilistic reasoning capabilities of LLMs. We present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming.
arXiv Detail & Related papers (2024-02-14T23:05:44Z)
A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models [0.8702432681310401]
We use a Bayesian network to implement a hypothesis about how a task is solved. The resulting models do not exhibit a strong similarity to GPT-3.5. We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.
arXiv Detail & Related papers (2024-02-07T12:26:12Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.