Related papers: Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

URL: http://arxiv.org/abs/2507.00606v2
Date: Thu, 03 Jul 2025 02:30:05 GMT
Title: Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies
Authors: Tao Xiong, Xavier Hu, Wenyan Fan, Shengyu Zhang,
Abstract summary: Mixture of Reasoning embeds diverse reasoning strategies into large language models.<n>MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines.
Score: 6.7519234849348075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning. Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.

Related papers

CIMR: Contextualized Iterative Multimodal Reasoning for Robust Instruction Following in LVLMs [2.238122883754112]
CIMR is a novel framework that introduces a context-aware iterative reasoning and self-correction module.<n> CIMR achieves 91.5% accuracy, outperforming state-of-the-art models such as GPT-4V, LLaVA-1.5, MiniGPT-4, and InstructBLIP.
arXiv Detail & Related papers (2025-07-22T18:39:18Z)
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization [74.04867639197445]
MiroMind-M1 is a set of fully open-source RLMs built on the Qwen-2.5-based benchmarks.<n>Our models are trained in two stages: SFT on a carefully curated corpus of 719K math-reasoning problems with verified CoT trajectories, followed by RLVR on 62K challenging and verifiable problems.
arXiv Detail & Related papers (2025-07-19T16:21:23Z)
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router [9.580226379350737]
Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models.<n>Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models.<n>We propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs.
arXiv Detail & Related papers (2025-06-06T09:18:56Z)
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models [32.51746551988431]
AdaReasoner is an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations.<n>AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy.<n>It consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.
arXiv Detail & Related papers (2025-05-22T22:06:11Z)
Modularization is Better: Effective Code Generation with Modular Prompting [9.955541341324007]
We propose a novel prompting technique, called MoT, to enhance the code generation performance of Large Language Models.<n>MoT exploits modularization principles to decompose complex programming problems into smaller, independent reasoning steps.<n>It structures the reasoning process using an MLR Graph, which hierarchically organizes reasoning steps.
arXiv Detail & Related papers (2025-03-16T12:23:23Z)
LATTE: Learning to Think with Vision Specialists [103.5952731807559]
We propose LATTE, a family of vision-language models that offload perception to state-of-the-art vision models.<n>By offloading perception to state-of-the-art vision models, our approach enables vision-language models to focus solely on reasoning over high-quality perceptual information.
arXiv Detail & Related papers (2024-12-07T00:42:04Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding [74.31981011985681]
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps. We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution. We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures.
arXiv Detail & Related papers (2024-11-06T22:02:30Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF [10.81723269312202]
Mixture-of-Experts (MoE) have been proposed as an energy efficient path to larger and more capable language models. We benchmark our proposed model on a large scale inner-source dataset (160k hours)
arXiv Detail & Related papers (2024-04-25T08:34:21Z)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.