Related papers: Revisiting Model Interpolation for Efficient Reasoning

Revisiting Model Interpolation for Efficient Reasoning

URL: http://arxiv.org/abs/2510.10977v1
Date: Mon, 13 Oct 2025 03:30:01 GMT
Title: Revisiting Model Interpolation for Efficient Reasoning
Authors: Taiqiang Wu, Runming Yang, Tao Liu, Jiahao Wang, Ngai Wong,
Abstract summary: We revisit the simplest merging method that interpolates two weights directly.<n>We observe that model follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory.
Score: 27.32667995137936
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Model merging, typically on Instruct and Thinking models, has shown remarkable performance for efficient reasoning. In this paper, we systematically revisit the simplest merging method that interpolates two weights directly. Particularly, we observe that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory. These dynamics provide a principled guide for navigating the performance-cost trade-off. Empirical results demonstrate that a strategically interpolated model surprisingly surpasses sophisticated model merging baselines on both efficiency and effectiveness. We further validate our findings with extensive ablation studies on model layers, modules, and decoding strategies. Ultimately, this work demystifies model interpolation and offers a practical framework for crafting models with precisely targeted reasoning capabilities. Code is available at \href{https://github.com/wutaiqiang/MI}{Github}.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation [49.98025799046136]
We introduce Merge-And-GuidE, a two-stage framework that leverages model merging for guided decoding.<n>In Stage 1, MAGE resolves a compatibility problem between the guidance and base models.<n>In Stage 2, we merge explicit and implicit value models into a unified guidance proxy, which then steers the decoding of the base model from Stage 1.
arXiv Detail & Related papers (2025-10-04T11:10:07Z)
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws [52.10468229008941]
This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting.<n>We provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model.<n>Building on these insights, we introduce a novel method for Contrastive Language-Image Pretraining with a reference model, termed DRRho-CLIP.
arXiv Detail & Related papers (2025-05-10T16:55:03Z)
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs [51.09983600916971]
Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic.<n>We argue that this linearity already exists within the model's submodules.<n>We propose an innovative model merging strategy that independently merges these submodules.
arXiv Detail & Related papers (2025-04-15T06:23:24Z)
Rethinking Weight-Averaged Model-merging [15.2881959315021]
Model merging, particularly through weight averaging, has shown surprising effectiveness in saving computations and improving model performance without any additional training.<n>In this work, we reinterpret weight-averaged model merging through the lens of interpretability and provide empirical insights into the underlying mechanisms that govern its behavior.
arXiv Detail & Related papers (2024-11-14T08:02:14Z)
Exploring Model Kinship for Merging Large Language Models [73.98345036483299]
We study model evolution through iterative merging, drawing an analogy to biological evolution.<n>We show that model kinship is closely linked to the performance improvements achieved by merging.<n>We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship.
arXiv Detail & Related papers (2024-10-16T14:29:29Z)
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z)
Inferring effective couplings with Restricted Boltzmann Machines [3.150368120416908]
Generative models attempt to encode correlations observed in the data at the level of the Boltzmann weight associated with an energy function in the form of a neural network. We propose a solution by implementing a direct mapping between the Restricted Boltzmann Machine and an effective Ising spin Hamiltonian.
arXiv Detail & Related papers (2023-09-05T14:55:09Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images. Our core contributions include a novel representation for distributions over rigid body transformations. We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z)
Improving Sequential Latent Variable Models with Autoregressive Flows [30.053464816814348]
We propose an approach for improving sequence modeling based on autoregressive normalizing flows. Results are presented on three benchmark video datasets, where autoregressive flow-based dynamics improve log-likelihood performance.
arXiv Detail & Related papers (2020-10-07T05:14:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.