ThoughtSource: A central hub for large language model reasoning data
        - URL: http://arxiv.org/abs/2301.11596v5
- Date: Thu, 27 Jul 2023 09:37:35 GMT
- Title: ThoughtSource: A central hub for large language model reasoning data
- Authors: Simon Ott, Konstantin Hebenstreit, Valentin Li\'evin, Christoffer
  Egeberg Hother, Milad Moradi, Maximilian Mayrhauser, Robert Praas, Ole
  Winther, Matthias Samwald
- Abstract summary: ThoughtSource is a meta-dataset and software library for chain-of-thought (CoT) reasoning.
The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs.
- Score: 13.185186859548326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Large language models (LLMs) such as GPT-4 have recently demonstrated
impressive results across a wide range of tasks. LLMs are still limited,
however, in that they frequently fail at complex reasoning, their reasoning
processes are opaque, they are prone to 'hallucinate' facts, and there are
concerns about their underlying biases. Letting models verbalize reasoning
steps as natural language, a technique known as chain-of-thought prompting, has
recently been proposed as a way to address some of these issues. Here we
present ThoughtSource, a meta-dataset and software library for chain-of-thought
(CoT) reasoning. The goal of ThoughtSource is to improve future artificial
intelligence systems by facilitating qualitative understanding of CoTs,
enabling empirical evaluations, and providing training data. This first release
of ThoughtSource integrates seven scientific/medical, three general-domain and
five math word question answering datasets.
 
      
        Related papers
        - Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware   Language Models [0.0]
 Large Language Models (LLMs) have demonstrated remarkable capabilities at solving complex reasoning tasks with Chain-of-Thought prompting.<n>We introduce textbfinverse reasoning, a novel paradigm enabling LLMs to decompose and explain their own reasoning chains post-hoc.<n>Our work creates new avenues for transparent AI systems and closes significant gaps in AI safety, education, and scientific discovery.
 arXiv  Detail & Related papers  (2025-06-30T09:53:41Z)
- Interleaved Reasoning for Large Language Models via Reinforcement   Learning [22.403928213802036]
 Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
 arXiv  Detail & Related papers  (2025-05-26T07:58:17Z)
- On the Thinking-Language Modeling Gap in Large Language Models [68.83670974539108]
 We show that there is a significant gap between the modeling of languages and thoughts.<n>We propose a new prompt technique termed Language-of-Thoughts (LoT) to demonstrate and alleviate this gap.
 arXiv  Detail & Related papers  (2025-05-19T09:31:52Z)
- MetaLadder: Ascending Mathematical Solution Quality via   Analogical-Problem Reasoning Transfer [37.81465564673498]
 Large Language Models (LLMs) have demonstrated promising capabilities in solving mathematical reasoning tasks.
We propose textbfMetaLadder, a framework that explicitly prompts LLMs to recall and reflect on meta-problems.
Our experiments on mathematical benchmarks demonstrate that our MetaLadder significantly boosts LLMs' problem-solving accuracy.
 arXiv  Detail & Related papers  (2025-03-19T04:36:35Z)
- Sketch-of-Thought: Efficient LLM Reasoning with Adaptive   Cognitive-Inspired Sketching [60.04718679054704]
 We introduce Sketch-of-Thought (SoT), a novel prompting framework.
It combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize token usage.
SoT achieves token reductions of 76% with negligible accuracy impact.
 arXiv  Detail & Related papers  (2025-03-07T06:57:17Z)
- Unveiling the Magic of Code Reasoning through Hypothesis Decomposition   and Amendment [54.62926010621013]
 We introduce a novel task, code reasoning, to provide a new perspective for the reasoning abilities of large language models.
We summarize three meta-benchmarks based on established forms of logical reasoning, and instantiate these into eight specific benchmark tasks.
We present a new pathway exploration pipeline inspired by human intricate problem-solving methods.
 arXiv  Detail & Related papers  (2025-02-17T10:39:58Z)
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
 MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
 arXiv  Detail & Related papers  (2024-05-25T15:07:33Z)
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability   of Large Language Models [52.03659714625452]
 Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
 arXiv  Detail & Related papers  (2024-04-23T21:08:49Z)
- Information Re-Organization Improves Reasoning in Large Language Models [22.2946033364035]
 We propose an information re-organization (InfoRE) method to enhance the reasoning ability of large language models (LLMs)
Our method involves extracting logical relationships from the contextual content, such as documents or paragraphs, and subsequently pruning redundant content to minimize noise.
To demonstrate the effectiveness of our approach in improving the reasoning ability, we conduct experiments using Llama2-70B, GPT-3.5, and GPT-4 on various contextually aware multi-hop reasoning tasks.
 arXiv  Detail & Related papers  (2024-04-22T08:47:27Z)
- How Do Humans Write Code? Large Models Do It the Same Way Too [14.954886191356342]
 Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models.
Using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT.
We propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT.
 arXiv  Detail & Related papers  (2024-02-24T05:40:01Z)
- Implicit Chain of Thought Reasoning via Knowledge Distillation [58.80851216530288]
 Instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning.
We find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
 arXiv  Detail & Related papers  (2023-11-02T17:59:49Z)
- MuSR: Testing the Limits of Chain-of-thought with Multistep Soft   Reasoning [63.80739044622555]
 We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
 arXiv  Detail & Related papers  (2023-10-24T17:59:20Z)
- LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
  Language Models with First-Order Logic Provers [60.009969929857704]
 Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
 arXiv  Detail & Related papers  (2023-10-23T17:58:40Z)
- Graph of Thoughts: Solving Elaborate Problems with Large Language Models [15.711472857775085]
 Graph of Thoughts (GoT) is a framework that advances prompting capabilities in large language models (LLMs)
The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph.
This work brings the reasoning closer to human thinking or brain mechanisms such as recurrence.
 arXiv  Detail & Related papers  (2023-08-18T17:29:23Z)
- MindGames: Targeting Theory of Mind in Large Language Models with
  Dynamic Epistemic Modal Logic [0.6537995248511139]
 Theory of Mind (ToM) is a critical component of intelligence but its assessment remains the subject of heated debates.
Here, we leverage dynamic epistemic logic to isolate a particular component of ToM and to generate controlled problems.
Our findings indicate that some language model scaling does not consistently yield results better than random chance.
 arXiv  Detail & Related papers  (2023-05-05T08:14:48Z)
- Visual Chain of Thought: Bridging Logical Gaps with Multimodal
  Infillings [61.04460792203266]
 We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data.
Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
 arXiv  Detail & Related papers  (2023-05-03T17:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.