Fugu-MT 論文翻訳(概要): Per-Axis Weight Deltas for Frequent Model Updates

論文の概要: Per-Axis Weight Deltas for Frequent Model Updates

arxiv url: http://arxiv.org/abs/2512.19720v1
Date: Tue, 16 Dec 2025 16:46:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-24 19:17:49.591731
Title: Per-Axis Weight Deltas for Frequent Model Updates
Title（参考訳）: 頻繁なモデル更新のための軸ごとの重み付きデルタ
Authors: Stefan Kuyumdzhiev, Radostin Cholakov,
Abstract要約: 本稿では,軽量な1軸FP16スケーリング因子とともに,重量差の符号のみを記憶する単純な1ビットデルタ方式を提案する。この設計は1ビットデルタのコンパクトさを保ちながら、重量次元の変動をより正確に捉えている。
参考スコア（独自算出の注目度）: 0.4552848064814397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Serving many task-specialized LLM variants is often limited by the large size of fine-tuned checkpoints and the resulting cold-start latency. Since fine-tuned weights differ from their base model by relatively small structured residuals, a natural approach is to represent them as compressed deltas. We propose a simple 1-bit delta scheme that stores only the sign of the weight difference together with lightweight per-axis (row/column) FP16 scaling factors, learned from a small calibration set. This design preserves the compactness of 1-bit deltas while more accurately capturing variation across weight dimensions, leading to improved reconstruction quality over scalar alternatives. From a systems perspective, a streamlined loader that transfers packed deltas in a single operation per module reduces cold-start latency and storage overhead, with artifacts several times smaller than a full FP16 checkpoint. The method is drop-in, requires minimal calibration data, and maintains inference efficiency by avoiding dense reconstruction. Our experimental setup and source code are available at https://github.com/kuiumdjiev/Per-Axis-Weight-Deltas-for-Frequent-Model-Updates.
Abstract（参考訳）: 多くのタスク特化LDM変種は、微調整されたチェックポイントの大きいサイズと、結果としてコールドスタートの遅延によって制限されることが多い。微調整された重量は、比較的小さな構造的残留物によって基礎モデルと異なるため、自然なアプローチはそれらを圧縮されたデルタとして表現することである。小型キャリブレーションセットから学習した軽量な1軸(ロウ/カラム)FP16スケーリング因子とともに、重量差の符号のみを記憶する単純な1ビットデルタ方式を提案する。この設計は1ビットデルタのコンパクトさを保ちながら、より正確に重量次元の変動を捉え、スカラー代替品よりも再現性を向上させる。システムの観点からは、モジュール毎にひとつの操作で充填されたデルタを転送する合理化ローダは、フルFP16チェックポイントの何倍も小さなアーティファクトで、コールドスタートのレイテンシとストレージオーバーヘッドを低減する。この方法はドロップインであり、最小限のキャリブレーションデータを必要とし、密集した再構成を回避して推論効率を維持する。実験的なセットアップとソースコードはhttps://github.com/kuiumdjiev/Per-Axis-Weight-Deltas-for-Frequent-Model-Updatesで公開しています。

論文の概要: Per-Axis Weight Deltas for Frequent Model Updates

関連論文リスト