RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
- URL: http://arxiv.org/abs/2508.03140v1
- Date: Tue, 05 Aug 2025 06:38:18 GMT
- Title: RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
- Authors: Junyao Yang, Jianwei Wang, Huiping Zhuang, Cen Chen, Ziqian Zeng,
- Abstract summary: Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities.<n>We introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior.<n>Our results show that RCP-Merging successfully merges a reasoning model with domain-specific ones, improving domain task performance by 9.5% and 9.2% over state-of-the-art methods.
- Score: 18.699960005016433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities through multi-step long CoT reasoning. To create a dual-capability model with long CoT capability and domain-specific knowledge without substantial computational and data costs, model merging emerges as a highly resource-efficient method. However, significant challenges lie in merging domain-specific LLMs with long CoT ones since nowadays merging methods suffer from reasoning capability degradation, even gibberish output and output collapse. To overcome this, we introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior, a novel merging framework designed to integrate domain-specific LLMs with long CoT capability, meanwhile maintaining model performance in the original domain. Treating reasoning model weights as foundational prior, our method utilizes a reasoning capability indicator to preserve core long CoT capability model weights while selectively merging essential domain-specific weights. We conducted extensive experiments on Qwen2.5-7B, Llama3.1-8B, and Qwen2.5-1.5B models in BioMedicine and Finance domains. Our results show that RCP-Merging successfully merges a reasoning model with domain-specific ones, improving domain task performance by 9.5% and 9.2% over state-of-the-art methods, without significantly harming the original long CoT reasoning capability.
Related papers
- Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z) - WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z) - Why Do More Experts Fail? A Theoretical Analysis of Model Merging [51.18155031364046]
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model.<n>Recent model merging methods have shown promising results, but struggle to maintain performance gains as the number of merged models increases.<n>We show that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged.
arXiv Detail & Related papers (2025-05-27T14:10:46Z) - Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z) - Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models [23.34070841541423]
We propose Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT)<n>Our experiments demonstrate that models trained using LS-Mixture SFT, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3%.<n>This work offers an approach to endow non-reasoning models with reasoning capabilities through supervised fine-tuning.
arXiv Detail & Related papers (2025-05-06T12:18:11Z) - Synergistic Weak-Strong Collaboration by Aligning Preferences [53.47675666475273]
Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge.<n>We propose a collaborative framework that pairs a specialized weak model with a general strong model.<n>We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths.
arXiv Detail & Related papers (2025-04-21T15:57:33Z) - Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? [57.17826305464394]
o1-like models produce the long Chain-of-Thought (CoT) reasoning steps to improve the reasoning abilities of existing Large Language Models (LLMs)<n>We introduce the DeltaBench, including the generated long CoTs from different o1-like models for different reasoning tasks.<n>Based on DeltaBench, we first perform fine-grained analysis of the generated long CoTs to discover the effectiveness and efficiency of different o1-like models.
arXiv Detail & Related papers (2025-02-26T17:59:27Z) - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning [113.49074603075032]
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks.<n>We explore whether scaling with longer CoTs can indeed impair the reasoning performance of Large Language Models (LLMs) in certain domains.
arXiv Detail & Related papers (2025-02-25T10:48:05Z) - Unconstrained Model Merging for Enhanced LLM Reasoning [42.079040543428036]
We explore the potential of merging multiple expert models into a single large language model.
We propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures.
Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that reasoning emerges from merging.
arXiv Detail & Related papers (2024-10-17T16:04:07Z) - Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - MCC-KD: Multi-CoT Consistent Knowledge Distillation [39.327560600207626]
We propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities.
In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions.
We investigate the effectiveness of MCC-KD with different model architectures and various model scales.
arXiv Detail & Related papers (2023-10-23T09:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.