Related papers: Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations

Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations

URL: http://arxiv.org/abs/2602.03237v1
Date: Tue, 03 Feb 2026 08:15:57 GMT
Title: Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations
Authors: Yuxuan Yao, Haonan Sheng, Qingsong Lv, Han Wu, Shuqi Liu, Zehua Liu, Zengyan Liu, Jiahui Gao, Haochen Tan, Xiaojin Fu, Haoli Bai, Hing Cheung So, Zhijiang Guo, Linqi Song,
Abstract summary: Streaming Merging is an innovative model updating paradigm that conceptualizes merging as an iterative optimization process.<n> ARM is a strategy designed to approximate gradient descent dynamics.<n> ARM requires only early SFT checkpoints and, through iterative merging, surpasses the fully converged SFT model.
Score: 55.047454145941366
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The escalating scale of Large Language Models (LLMs) necessitates efficient adaptation techniques. Model merging has gained prominence for its efficiency and controllability. However, existing merging techniques typically serve as post-hoc refinements or focus on mitigating task interference, often failing to capture the dynamic optimization benefits of supervised fine-tuning (SFT). In this work, we propose Streaming Merging, an innovative model updating paradigm that conceptualizes merging as an iterative optimization process. Central to this paradigm is \textbf{ARM} (\textbf{A}ctivation-guided \textbf{R}otation-aware \textbf{M}erging), a strategy designed to approximate gradient descent dynamics. By treating merging coefficients as learning rates and deriving rotation vectors from activation subspaces, ARM effectively steers parameter updates along data-driven trajectories. Unlike conventional linear interpolation, ARM aligns semantic subspaces to preserve the geometric structure of high-dimensional parameter evolution. Remarkably, ARM requires only early SFT checkpoints and, through iterative merging, surpasses the fully converged SFT model. Experimental results across model scales (1.7B to 14B) and diverse domains (e.g., math, code) demonstrate that ARM can transcend converged checkpoints. Extensive experiments show that ARM provides a scalable and lightweight framework for efficient model adaptation.

Related papers

Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging [20.429700094073684]
We propose Sparse Complementary Fusion with reverse KL (SCF-RKL), a novel model merging framework that explicitly controls functional interference through sparse, distribution-aware updates.<n>We evaluate SCF-RKL across a wide range of model scales and architectures, covering both reasoning-focused and instruction-tuned models.
arXiv Detail & Related papers (2026-02-12T08:45:42Z)
ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging [51.409102048965394]
Agent-Role Merging (ARM) is an activation-guided, role-conditioned neuron transplantation method for model merging in LLM agents.<n>ARM improves existing merging methods from static natural language tasks to multi-turn agent scenarios.
arXiv Detail & Related papers (2026-01-12T08:31:53Z)
Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs [27.978175136002005]
Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs)<n>We first align the models' parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures.<n>We successfully transfer advanced reasoning skills to a non-reasoning model.
arXiv Detail & Related papers (2025-11-13T23:20:57Z)
Activation Manifold Projection: Liberating Task-Specific Behaviors from LLM Architectures [0.0]
This paper introduces the Cartridge Activation Space Transfer (CAST), a novel framework that liberates LoRA-encoded behaviors.<n>CAST learns a set of lightweight, bidirectional projection heads that translate the target model's activation stream into the source model's latent space.<n>Experiments show that CAST-translated adapters achieve 85-95% of the performance of a LoRA fully retrained on the target model.
arXiv Detail & Related papers (2025-10-19T10:55:05Z)
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation [62.14510717860079]
We propose a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion.<n>SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation.<n>Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation.
arXiv Detail & Related papers (2025-10-07T17:29:28Z)
Harnessing Optimization Dynamics for Curvature-Informed Model Merging [17.42364575754576]
In supervised fine-tuning, multiple capability-based SFT checkpoints must be consolidated into a single model.<n>We introduce Optimization Trajectory Aware (OTA) Merging and Fast Fisher Grafting (FFG)<n>OTA+FFG improves merged-model quality over strong weight-space baselines, reduces negative transfer, and remains robust across sparsity levels.
arXiv Detail & Related papers (2025-09-14T08:59:53Z)
Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models [13.742950928229078]
Low-Rank Adaptation (LoRA) addresses these issues by training compact, low-rank matrices instead of fully fine-tuning large models.<n>This paper introduces a wireless federated LoRA fine-tuning framework that optimize both learning performance and communication efficiency.
arXiv Detail & Related papers (2025-05-01T06:15:38Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.