ThoughtSource: A central hub for large language model reasoning data
- URL: http://arxiv.org/abs/2301.11596v5
- Date: Thu, 27 Jul 2023 09:37:35 GMT
- Title: ThoughtSource: A central hub for large language model reasoning data
- Authors: Simon Ott, Konstantin Hebenstreit, Valentin Li\'evin, Christoffer
Egeberg Hother, Milad Moradi, Maximilian Mayrhauser, Robert Praas, Ole
Winther, Matthias Samwald
- Abstract summary: ThoughtSource is a meta-dataset and software library for chain-of-thought (CoT) reasoning.
The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs.
- Score: 13.185186859548326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) such as GPT-4 have recently demonstrated
impressive results across a wide range of tasks. LLMs are still limited,
however, in that they frequently fail at complex reasoning, their reasoning
processes are opaque, they are prone to 'hallucinate' facts, and there are
concerns about their underlying biases. Letting models verbalize reasoning
steps as natural language, a technique known as chain-of-thought prompting, has
recently been proposed as a way to address some of these issues. Here we
present ThoughtSource, a meta-dataset and software library for chain-of-thought
(CoT) reasoning. The goal of ThoughtSource is to improve future artificial
intelligence systems by facilitating qualitative understanding of CoTs,
enabling empirical evaluations, and providing training data. This first release
of ThoughtSource integrates seven scientific/medical, three general-domain and
five math word question answering datasets.
Related papers
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Information Re-Organization Improves Reasoning in Large Language Models [22.2946033364035]
We propose an information re-organization (InfoRE) method to enhance the reasoning ability of large language models (LLMs)
Our method involves extracting logical relationships from the contextual content, such as documents or paragraphs, and subsequently pruning redundant content to minimize noise.
To demonstrate the effectiveness of our approach in improving the reasoning ability, we conduct experiments using Llama2-70B, GPT-3.5, and GPT-4 on various contextually aware multi-hop reasoning tasks.
arXiv Detail & Related papers (2024-04-22T08:47:27Z) - How Do Humans Write Code? Large Models Do It the Same Way Too [14.954886191356342]
Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models.
Using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT.
We propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT.
arXiv Detail & Related papers (2024-02-24T05:40:01Z) - Implicit Chain of Thought Reasoning via Knowledge Distillation [58.80851216530288]
Instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning.
We find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
arXiv Detail & Related papers (2023-11-02T17:59:49Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Graph of Thoughts: Solving Elaborate Problems with Large Language Models [15.711472857775085]
Graph of Thoughts (GoT) is a framework that advances prompting capabilities in large language models (LLMs)
The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph.
This work brings the reasoning closer to human thinking or brain mechanisms such as recurrence.
arXiv Detail & Related papers (2023-08-18T17:29:23Z) - MindGames: Targeting Theory of Mind in Large Language Models with
Dynamic Epistemic Modal Logic [0.6537995248511139]
Theory of Mind (ToM) is a critical component of intelligence but its assessment remains the subject of heated debates.
Here, we leverage dynamic epistemic logic to isolate a particular component of ToM and to generate controlled problems.
Our findings indicate that some language model scaling does not consistently yield results better than random chance.
arXiv Detail & Related papers (2023-05-05T08:14:48Z) - Visual Chain of Thought: Bridging Logical Gaps with Multimodal
Infillings [61.04460792203266]
We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data.
Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
arXiv Detail & Related papers (2023-05-03T17:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.