Per-Axis Weight Deltas for Frequent Model Updates
- URL: http://arxiv.org/abs/2512.19720v1
- Date: Tue, 16 Dec 2025 16:46:28 GMT
- Title: Per-Axis Weight Deltas for Frequent Model Updates
- Authors: Stefan Kuyumdzhiev, Radostin Cholakov,
- Abstract summary: We propose a simple 1-bit delta scheme that stores only the sign of the weight difference together with lightweight per-axis (row/column) FP16 scaling factors.<n>This design preserves the compactness of 1-bit deltas while more accurately capturing variation across weight dimensions.
- Score: 0.4552848064814397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Serving many task-specialized LLM variants is often limited by the large size of fine-tuned checkpoints and the resulting cold-start latency. Since fine-tuned weights differ from their base model by relatively small structured residuals, a natural approach is to represent them as compressed deltas. We propose a simple 1-bit delta scheme that stores only the sign of the weight difference together with lightweight per-axis (row/column) FP16 scaling factors, learned from a small calibration set. This design preserves the compactness of 1-bit deltas while more accurately capturing variation across weight dimensions, leading to improved reconstruction quality over scalar alternatives. From a systems perspective, a streamlined loader that transfers packed deltas in a single operation per module reduces cold-start latency and storage overhead, with artifacts several times smaller than a full FP16 checkpoint. The method is drop-in, requires minimal calibration data, and maintains inference efficiency by avoiding dense reconstruction. Our experimental setup and source code are available at https://github.com/kuiumdjiev/Per-Axis-Weight-Deltas-for-Frequent-Model-Updates.
Related papers
- Quantized Visual Geometry Grounded Transformer [67.15451442018258]
This paper proposes the first Quantization framework for VGGTs, namely QuantVGGT.<n>We introduce Dual-Smoothed Fine-Grained Quantization, which integrates pre-global Hadamard rotation and post-local channel smoothing.<n>We also design Noise-Filtered Diverse Sampling, which filters outliers via deep-layer statistics.
arXiv Detail & Related papers (2025-09-25T15:17:11Z) - Delta-SVD: Efficient Compression for Personalized Text-to-Image Models [25.0585375727713]
We present Delta-SVD, a post-hoc, training-free compression method that targets the parameter weights update induced by DreamBooth fine-tuning.<n>We show that Delta-SVD achieves substantial compression with negligible loss in generation quality measured by CLIP score, SSIM and FID.
arXiv Detail & Related papers (2025-08-23T01:21:46Z) - Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression [57.71917274869577]
UltraDelta is a data-free delta compression pipeline that achieves both ultra-high compression and strong performance.<n>UltraDelta is designed to minimize redundancy, maximize information, and stabilize performance across inter-layer, intra-layer, and global dimensions.
arXiv Detail & Related papers (2025-05-19T10:37:22Z) - Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform [51.29604910007176]
We introduce Delta-DCT, the first data-free delta compression method inspired by classic JPEG image compression, leveraging the Discrete Cosine Transform (DCT)<n>The proposed Delta-DCT does not require any training or data calibration, while achieving performance comparable to or even surpassing original finetuned models under 1-bit equivalent delta compression ratios on different kinds of models including: (1) recently-released LLMs of different sizes from 7B to 13B, (2) relatively smaller language models including RoBERTa and T5 models, (3) variants of vision transformer models, and (4) multi-modal BEiT-3 models.
arXiv Detail & Related papers (2025-03-09T16:03:48Z) - Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement [9.454314879815337]
generative models often exhibit dominant singular vectors, hindering fine-tuning efficiency and leading to suboptimal performance.<n>We introduce Singular Value Scaling (SVS), a versatile technique for refining pruned weights, applicable to both model types.<n>SVS improves compression performance across model types without additional training costs.
arXiv Detail & Related papers (2024-12-23T08:40:08Z) - IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models [68.55148272295916]
IntLoRA adapts quantized diffusion models with integer-type low-rank parameters to include inference efficiency during tuning.<n>During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ.
arXiv Detail & Related papers (2024-10-29T05:50:17Z) - DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization [17.501956455837707]
Large language models achieve exceptional performance on various downstream tasks through supervised fine-tuning.
Current methods that compress the delta weight struggle to achieve ultra-high compression.
We propose a novel distribution-driven delta compression framework DeltaDQ to achieve ultra-high compression for the delta weight.
arXiv Detail & Related papers (2024-10-11T09:44:16Z) - BitDelta: Your Fine-Tune May Only Be Worth One Bit [57.558376557639555]
Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.
We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance.
By enabling the use of a single high-precision base model accompanied by multiple 1-bit deltas, BitDelta dramatically reduces GPU memory requirements by more than 10x.
arXiv Detail & Related papers (2024-02-15T18:50:06Z) - OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of
Pre-trained Models [81.7855202178564]
We present OpenDelta, an open-source library that overcomes limitations by providing a plug-and-play implementation of various delta tuning methods.
Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs.
arXiv Detail & Related papers (2023-07-05T16:30:14Z) - Robust Weight Signatures: Gaining Robustness as Easy as Patching
Weights? [81.77457373726736]
Given a robust model trained to be resilient to one or multiple types of distribution shifts, how is that "robustness" encoded in the model weights?
We propose a minimalistic model robustness "patching" framework that carries a model trained on clean data together with its pre-extracted RWSs.
In this way, injecting certain robustness to the model is reduced to directly adding the corresponding RWS to its weight.
arXiv Detail & Related papers (2023-02-24T06:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.