Fugu-MT 論文翻訳(概要): A Unified Framework for Knowledge Transfer in Bidirectional Model Scaling

論文の概要: A Unified Framework for Knowledge Transfer in Bidirectional Model Scaling

arxiv url: http://arxiv.org/abs/2603.07506v1
Date: Sun, 08 Mar 2026 07:23:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.698354
Title: A Unified Framework for Knowledge Transfer in Bidirectional Model Scaling
Title（参考訳）: 双方向モデルスケーリングにおける知識伝達のための統一フレームワーク
Authors: Jianlu Shen, Fu Feng, Jiaze Xu, Yucheng Xie, Jiaqi Lv, Xin Geng,
Abstract要約: 本稿では,S2LとL2Sのスケーリングを統一する最初のサイズに依存しないフレームワークであるBoTを提案する。私たちの中心となる洞察は、モデルの重みを連続的な信号として扱うことです。 DeiT,BERT,GPTの広範囲な実験により,FLOPの保存が顕著であった。
参考スコア（独自算出の注目度）: 31.9971752399134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transferring pre-trained knowledge from a source model to a target model of a different architectural size is a key challenge for flexible and efficient model scaling. However, current parameter-space methods treat Small-to-Large (S2L) and Large-to-Small (L2S) scaling as separate, incompatible problems, focusing on parameter synthesis and selection, respectively. This fragmented perspective has resulted in specialized tools, hindering a unified, bidirectional framework. In this paper, we propose BoT (Bidirectional knowledge Transfer), the first size-agnostic framework to unify S2L and L2S scaling. Our core insight is to treat model weights as continuous signals, where models of different sizes represent distinct discretizations of the transferable knowledge. This multi-resolution perspective directly casts S2L and L2S scaling as the signal processing operations of upsampling and downsampling, naturally leading to the adoption of the Discrete Wavelet Transform (DWT) and its Inverse (IDWT). BoT leverages the recursive nature of wavelets, using the decomposition level as a dynamic scaling factor to bridge disparate model sizes in a parameter-free and computationally efficient manner. Extensive experiments on DeiT, BERT, and GPT demonstrate significant pre-training FLOPs savings (up to 67.1% for S2L, 52.8% for L2S) and state-of-the-art performance on benchmarks like GLUE and SQuAD.
Abstract（参考訳）: ソースモデルから異なるアーキテクチャサイズのターゲットモデルに事前訓練された知識を移行することは、フレキシブルで効率的なモデルスケーリングにとって重要な課題である。しかし、現在のパラメータ空間法では、Small-to-Large(S2L)とLarge-to-Small(L2S)のスケーリングは、それぞれパラメータ合成と選択に焦点をあてて、独立した非互換な問題として扱われている。この断片化された視点は特別なツールをもたらし、統一された双方向フレームワークを妨げる。本稿では,S2LとL2Sのスケーリングを統一する最初のサイズに依存しないフレームワークであるBoTを提案する。私たちの中心となる洞察は、モデルの重みを連続的な信号として扱うことです。このマルチレゾリューションは、アップサンプリングとダウンサンプリングの信号処理操作としてS2LとL2Sのスケーリングを直接適用し、離散ウェーブレット変換(DWT)と逆変換(IDWT)を採用した。 BoTはウェーブレットの帰納的性質を活用し、分解レベルを動的スケーリング因子として利用し、パラメータフリーで計算的に効率的な方法で異なるモデルサイズをブリッジする。 DeiT、BERT、GPTの大規模な実験では、FLOPsのトレーニング前の大幅な削減(S2Lは67.1%、L2Sは52.8%)とGLUEやSQuADのようなベンチマークにおける最先端のパフォーマンスを示している。

論文の概要: A Unified Framework for Knowledge Transfer in Bidirectional Model Scaling

関連論文リスト