Cumulative Reasoning with Large Language Models
- URL: http://arxiv.org/abs/2308.04371v6
- Date: Tue, 2 Apr 2024 03:37:39 GMT
- Title: Cumulative Reasoning with Large Language Models
- Authors: Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao,
- Abstract summary: Cumulative Reasoning (CR) is a novel approach that utilizes language models cumulatively and iteratively.
We demonstrate CR's superiority through several complex reasoning tasks.
CR sets new state-of-the-art on the MATH dataset.
- Score: 12.267474250936123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at https://github.com/iiis-ai/cumulative-reasoning.
Related papers
- Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving [0.0]
This study evalu-ates 10 large language models (LLMs) with 7 to 8 billion parameters using the MATH dataset.
The focus is on their ability to generate executable Python code as a step in their reasoning process, involving over 9,450 code executions.
arXiv Detail & Related papers (2025-01-28T17:11:36Z) - Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving [0.0]
Recent advances in large language models (LLMs) have predominantly focused on maximizing accuracy and reasoning capabilities.
This paper investigates the potential synergy between reasoning enhancement and computational efficiency by analyzing the integration of two contrasting approaches.
arXiv Detail & Related papers (2024-12-20T08:42:45Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.
Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.
We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - Exploring Performance Contrasts in TableQA: Step-by-Step Reasoning Boosts Bigger Language Models, Limits Smaller Language Models [6.083393426133172]
This paper proposes a detailed prompting flow, termed Table-Logic, to investigate the performance contrasts between bigger and smaller language models (LMs)
By deploying this method, we observe a 7.8% accuracy improvement in bigger LMs like Llama-3-70B compared to the vanilla on HybridQA.
Our findings highlight the limitations of the step-by-step reasoning method in small models and provide potential insights for making improvements.
arXiv Detail & Related papers (2024-11-24T22:48:44Z) - BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search [22.672130194493793]
Large Language Models (LLMs) have exhibited exceptional performance across a broad range of tasks and domains.
They still encounter difficulties in solving mathematical problems due to the rigorous and logical nature of mathematics.
We propose a novel approach, BEATS, to enhance mathematical problem-solving abilities.
arXiv Detail & Related papers (2024-09-26T15:47:42Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.
Existing direct preference learning algorithms are originally designed for the single-turn chat task.
We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - PORT: Preference Optimization on Reasoning Traces [1.7292887546437081]
This paper proposes using preference optimization methods on Chain-of-Thought steps in order to improve the mathematical reasoning performances of language models.
Our approach leads to increased accuracy on the GSM8K and AQuA-RAT mathematical reasoning benchmarks for Falcon2-11B and Mistral-7B.
The improved abilities transfer to non-mathematical tasks, including the ARC benchmark and symbolic reasoning challenges.
arXiv Detail & Related papers (2024-06-23T09:51:06Z) - Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems [50.76385564061713]
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks.
CoT usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors.
We propose Deeply Understanding the Problems (DUP) to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
arXiv Detail & Related papers (2024-04-23T12:16:05Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Tool-Augmented Reward Modeling [58.381678612409]
We propose a tool-augmented preference modeling approach, named Themis, to address limitations by empowering RMs with access to external environments.
Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources.
In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines.
arXiv Detail & Related papers (2023-10-02T09:47:40Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.