Fugu-MT 論文翻訳(概要): Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments

論文の概要: Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments

arxiv url: http://arxiv.org/abs/2509.22319v2
Date: Wed, 01 Oct 2025 13:53:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-02 14:33:21.789592
Title: Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
Title（参考訳）: プログレッシブウェイトローディング:資源制約環境における初期推論の高速化と漸進的な性能向上
Authors: Hyunwoo Kim, Junha Lee, Mincheol Choi, Jeonghwan Lee, Jaeshin Cho,
Abstract要約: プログレッシブウェイトローディング(Progressive Weight Loading, PWL)は、最初は軽量の学生モデルをデプロイし、次にその層を事前訓練された教師モデルに置き換えることで、高速な初期推論を可能にする技術である。 VGG, ResNet, ViT アーキテクチャに関する実験により,PWL で訓練されたモデルは,教師層がロードされるにつれて,競争蒸留性能を維持し,徐々に精度を向上することを示した。
参考スコア（独自算出の注目度）: 8.020686883632594
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Deep learning models have become increasingly large and complex, resulting in higher memory consumption and computational demands. Consequently, model loading times and initial inference latency have increased, posing significant challenges in mobile and latency-sensitive environments where frequent model loading and unloading are required, which directly impacts user experience. While Knowledge Distillation (KD) offers a solution by compressing large teacher models into smaller student ones, it often comes at the cost of reduced performance. To address this trade-off, we propose Progressive Weight Loading (PWL), a novel technique that enables fast initial inference by first deploying a lightweight student model, then incrementally replacing its layers with those of a pre-trained teacher model. To support seamless layer substitution, we introduce a training method that not only aligns intermediate feature representations between student and teacher layers, but also improves the overall output performance of the student model. Our experiments on VGG, ResNet, and ViT architectures demonstrate that models trained with PWL maintain competitive distillation performance and gradually improve accuracy as teacher layers are loaded-matching the final accuracy of the full teacher model without compromising initial inference speed. This makes PWL particularly suited for dynamic, resource-constrained deployments where both responsiveness and performance are critical.
Abstract（参考訳）: ディープラーニングモデルはますます大きく複雑になり、メモリ消費と計算要求が増大する。その結果、モデルローディング時間と初期推論レイテンシが増加し、頻繁にモデルローディングとアンロードが必要なモバイルおよび遅延に敏感な環境において、ユーザエクスペリエンスに直接影響する重要な課題が生じる。知識蒸留(KD)は、大きな教師モデルを小さな学生に圧縮することで解を提供するが、性能を低下させるコストがかかることが多い。このトレードオフに対処するため,まず軽量な学生モデルを配置し,その層を事前学習した教師モデルに置き換えることで,初期推論を高速に行う新しい手法であるProgressive Weight Loading (PWL)を提案する。シームレスな層置換を支援するため,学生層と教師層間の中間特徴表現を整列するだけでなく,学生モデルの全体的な出力性能を向上させる訓練手法を提案する。 VGG, ResNet, ViT アーキテクチャに関する実験により,PWL で訓練されたモデルは,初期の推論速度を損なうことなく,教師層をロードすることにより,競争蒸留性能を維持し,徐々に精度を向上することを示した。これによってPWLは、応答性とパフォーマンスの両方が重要となる動的でリソース制約のあるデプロイメントに特に適している。

論文の概要: Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments

関連論文リスト