CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model
- URL: http://arxiv.org/abs/2502.11169v2
- Date: Mon, 16 Jun 2025 11:37:38 GMT
- Title: CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model
- Authors: Qingwen Lin, Boyan Xu, Guimin Hu, Zijian Li, Zhifeng Hao, Keli Zhang, Ruichu Cai,
- Abstract summary: This paper introduces the Constrained Monte Carlo Tree Search (CMCTS) framework to enhance the mathematical reasoning capabilities of Large Language Models (LLM)<n>By incorporating a constrained action space, Process Reward Model (PRM), and partial order rules, CMCTS effectively addresses the limitations in terms of state space diversity and action selection rationality.<n> Experimental results show that CMCTS performs outstandingly across multiple mathematical reasoning benchmarks.
- Score: 20.81044211340922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the Constrained Monte Carlo Tree Search (CMCTS) framework to enhance the mathematical reasoning capabilities of Large Language Models (LLM). By incorporating a constrained action space, Process Reward Model (PRM), and partial order rules, CMCTS effectively addresses the limitations of existing MCTS methods in terms of state space diversity and action selection rationality. Specifically, during the expansion phase, CMCTS restricts action sampling to a predefined constrained action set to increase candidate state diversity. In the simulation phase, it introduces partial order rules and PRM to optimize action selection and prevent unreasonable state transitions. Experimental results show that CMCTS performs outstandingly across multiple mathematical reasoning benchmarks. Under a zero-shot setting, a 7B-parameter model achieves an average accuracy of 83.4\%, surpassing the 72B baseline model by 4.8\%. Ablation studies demonstrate that each component of the framework is crucial for performance improvement, and their combined use fully leverages their respective strengths. Overall, the CMCTS framework provides an effective approach to enhancing LLM mathematical reasoning capabilities, supported by theoretical analysis, and offers novel insights for future reasoning tasks.
Related papers
- Discriminative Policy Optimization for Token-Level Reward Models [55.98642069903191]
Process reward models (PRMs) provide more nuanced supervision compared to outcome reward models (ORMs)<n>Q-RM explicitly learns token-level Q-functions from preference data without relying on fine-grained annotations.<n>Reinforcement learning with Q-RM significantly enhances training efficiency, achieving convergence 12 times faster than ORM on GSM8K and 11 times faster than step-level PRM on MATH.
arXiv Detail & Related papers (2025-05-29T11:40:34Z) - Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z) - MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search [27.378904180238557]
We introduce MCTS-RAG, a novel approach that enhances the reasoning capabilities of small language models on knowledge-intensive tasks.<n>Unlike standard RAG methods, which typically retrieve information independently from reasoning, MCTS-RAG combines structured reasoning with adaptive retrieval.<n>This integrated approach enhances decision-making, reduces hallucinations, and ensures improved factual accuracy and response consistency.
arXiv Detail & Related papers (2025-03-26T17:46:08Z) - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning [113.49074603075032]
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks.
We explore whether scaling with longer CoTs can indeed impair the reasoning performance of Large Language Models (LLMs) in certain domains.
arXiv Detail & Related papers (2025-02-25T10:48:05Z) - LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs)<n>However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?<n>In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
arXiv Detail & Related papers (2025-02-11T05:28:59Z) - Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)<n>Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined.
arXiv Detail & Related papers (2025-02-11T02:51:42Z) - Policy Guided Tree Search for Enhanced LLM Reasoning [3.090041654375235]
Policy-Guided Tree Search (PGTS) is a framework that combines reinforcement learning with structured tree exploration to efficiently navigate reasoning paths.<n>Our key innovation is a learned policy that dynamically decides between expanding, branching, backtracking, or terminating exploration, eliminating the need for manuals or exhaustive search.
arXiv Detail & Related papers (2025-02-04T22:08:20Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.<n>We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective [82.9413277326097]
Chain-of-Reasoning (CoR) is a novel unified framework integrating multiple reasoning paradigms.<n>CoR generates multiple potential answers via different reasoning paradigms and synthesizes them into a coherent final solution.<n> Experimental results demonstrate that CoR-Math-7B significantly outperforms current SOTA models.
arXiv Detail & Related papers (2025-01-19T16:53:26Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.<n>We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.<n>Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Break the Chain: Large Language Models Can be Shortcut Reasoners [18.047917626825548]
Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in thinking.
This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks.
We propose the integration of human-likes and shortcuts into language models (LMs) through "break the chain" strategies.
arXiv Detail & Related papers (2024-06-04T14:02:53Z) - SPO: Sequential Monte Carlo Policy Optimisation [41.52684912140086]
We introduce SPO: Sequential Monte Carlo Policy optimisation.
We show that SPO provides robust policy improvement and efficient scaling properties.
We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines.
arXiv Detail & Related papers (2024-02-12T10:32:47Z) - Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning [31.110005898556892]
Large Language Models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning.
We propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning.
CoT-Influx employs a coarse-to-fine pruner to maximize the input of effective and concise CoT examples.
arXiv Detail & Related papers (2023-12-14T13:03:13Z) - Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
Distillation in Small Models for Scientific QA [5.117094291273979]
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks.
We propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers.
Our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
arXiv Detail & Related papers (2023-08-09T03:18:07Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - MCCE: Monte Carlo sampling of realistic counterfactual explanations [2.156170153103442]
MCCE is a novel on-manifold, actionable and valid counterfactual explanation method.
It generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features.
We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets.
arXiv Detail & Related papers (2021-11-18T16:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.