Related papers: RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior

RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior

URL: http://arxiv.org/abs/2508.03140v1
Date: Tue, 05 Aug 2025 06:38:18 GMT
Title: RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
Authors: Junyao Yang, Jianwei Wang, Huiping Zhuang, Cen Chen, Ziqian Zeng,
Abstract summary: Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities.<n>We introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior.<n>Our results show that RCP-Merging successfully merges a reasoning model with domain-specific ones, improving domain task performance by 9.5% and 9.2% over state-of-the-art methods.
Score: 18.699960005016433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities through multi-step long CoT reasoning. To create a dual-capability model with long CoT capability and domain-specific knowledge without substantial computational and data costs, model merging emerges as a highly resource-efficient method. However, significant challenges lie in merging domain-specific LLMs with long CoT ones since nowadays merging methods suffer from reasoning capability degradation, even gibberish output and output collapse. To overcome this, we introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior, a novel merging framework designed to integrate domain-specific LLMs with long CoT capability, meanwhile maintaining model performance in the original domain. Treating reasoning model weights as foundational prior, our method utilizes a reasoning capability indicator to preserve core long CoT capability model weights while selectively merging essential domain-specific weights. We conducted extensive experiments on Qwen2.5-7B, Llama3.1-8B, and Qwen2.5-1.5B models in BioMedicine and Finance domains. Our results show that RCP-Merging successfully merges a reasoning model with domain-specific ones, improving domain task performance by 9.5% and 9.2% over state-of-the-art methods, without significantly harming the original long CoT reasoning capability.

Related papers

Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z)
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z)
Why Do More Experts Fail? A Theoretical Analysis of Model Merging [51.18155031364046]
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model.<n>Recent model merging methods have shown promising results, but struggle to maintain performance gains as the number of merged models increases.<n>We show that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged.
arXiv Detail & Related papers (2025-05-27T14:10:46Z)
Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z)
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models [23.34070841541423]
We propose Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT)<n>Our experiments demonstrate that models trained using LS-Mixture SFT, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3%.<n>This work offers an approach to endow non-reasoning models with reasoning capabilities through supervised fine-tuning.
arXiv Detail & Related papers (2025-05-06T12:18:11Z)
Synergistic Weak-Strong Collaboration by Aligning Preferences [53.47675666475273]
Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge.<n>We propose a collaborative framework that pairs a specialized weak model with a general strong model.<n>We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths.
arXiv Detail & Related papers (2025-04-21T15:57:33Z)
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? [57.17826305464394]
o1-like models produce the long Chain-of-Thought (CoT) reasoning steps to improve the reasoning abilities of existing Large Language Models (LLMs)<n>We introduce the DeltaBench, including the generated long CoTs from different o1-like models for different reasoning tasks.<n>Based on DeltaBench, we first perform fine-grained analysis of the generated long CoTs to discover the effectiveness and efficiency of different o1-like models.
arXiv Detail & Related papers (2025-02-26T17:59:27Z)
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning [113.49074603075032]
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks.<n>We explore whether scaling with longer CoTs can indeed impair the reasoning performance of Large Language Models (LLMs) in certain domains.
arXiv Detail & Related papers (2025-02-25T10:48:05Z)
Unconstrained Model Merging for Enhanced LLM Reasoning [42.079040543428036]
We explore the potential of merging multiple expert models into a single large language model. We propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures. Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that reasoning emerges from merging.
arXiv Detail & Related papers (2024-10-17T16:04:07Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
MCC-KD: Multi-CoT Consistent Knowledge Distillation [39.327560600207626]
We propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions. We investigate the effectiveness of MCC-KD with different model architectures and various model scales.
arXiv Detail & Related papers (2023-10-23T09:32:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.