Fugu-MT 論文翻訳(概要): Quantized Adaptive Subgradient Algorithms and Their Applications

論文の概要: Quantized Adaptive Subgradient Algorithms and Their Applications

arxiv url: http://arxiv.org/abs/2208.05631v1
Date: Thu, 11 Aug 2022 04:04:03 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-12 12:56:13.545593
Title: Quantized Adaptive Subgradient Algorithms and Their Applications
Title（参考訳）: 量子化適応劣勾配アルゴリズムとその応用
Authors: Ke Xu, Jianqiao Wangni, Yifan Zhang, Deheng Ye, Jiaxiang Wu and Peilin Zhao
Abstract要約: 本稿では、分散トレーニングのための量子化された複合ミラー降下適応次数 (QCMD adagrad) と量子化された正規化された2次平均適応次数 (QRDA adagrad) を提案する。量子化勾配に基づく適応学習率行列を構築し、通信コスト、精度、モデル間隔のバランスをとる。
参考スコア（独自算出の注目度）: 39.103587572626026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data explosion and an increase in model size drive the remarkable advances in large-scale machine learning, but also make model training time-consuming and model storage difficult. To address the above issues in the distributed model training setting which has high computation efficiency and less device limitation, there are still two main difficulties. On one hand, the communication costs for exchanging information, e.g., stochastic gradients among different workers, is a key bottleneck for distributed training efficiency. On the other hand, less parameter model is easy for storage and communication, but the risk of damaging the model performance. To balance the communication costs, model capacity and model performance simultaneously, we propose quantized composite mirror descent adaptive subgradient (QCMD adagrad) and quantized regularized dual average adaptive subgradient (QRDA adagrad) for distributed training. To be specific, we explore the combination of gradient quantization and sparse model to reduce the communication cost per iteration in distributed training. A quantized gradient-based adaptive learning rate matrix is constructed to achieve a balance between communication costs, accuracy, and model sparsity. Moreover, we theoretically find that a large quantization error brings in extra noise, which influences the convergence and sparsity of the model. Therefore, a threshold quantization strategy with a relatively small error is adopted in QCMD adagrad and QRDA adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model. Both theoretical analyses and empirical results demonstrate the efficacy and efficiency of the proposed algorithms.
Abstract（参考訳）: データ爆発とモデルサイズの増加は、大規模機械学習の驚くべき進歩を駆動する一方で、モデルのトレーニング時間とモデルストレージを困難にする。計算効率が高く、デバイス制限の少ない分散モデルトレーニング設定では、上記の問題に対処するためには、2つの大きな課題がある。一方、労働者間の確率勾配などの情報を交換するための通信コストは、分散トレーニング効率の重要なボトルネックとなっている。一方、パラメータモデルが少ないことはストレージや通信が容易であるが、モデル性能を損なうリスクがある。通信コスト,モデル容量,モデル性能を同時にバランスさせるため,分散学習のための量子化複合ミラー降下適応サブグレード (qcmd adagrad) と量子化正規化正規化2平均適応サブグレード (qrda adagrad) を提案する。具体的には、勾配量子化とスパースモデルを組み合わせることで、分散トレーニングにおけるイテレーション毎の通信コストを削減する。量子化勾配に基づく適応学習率行列を構築し、通信コスト、精度、モデルのスパーシティのバランスを図る。さらに、理論的には、大きな量子化誤差は余分なノイズをもたらし、モデルの収束と空間性に影響を与える。そこで,QCMDアダグラードとQRDAアダグラードでは,比較的誤差の少ないしきい値量子化戦略を採用し,信号対雑音比を改善し,モデルの空間性を維持する。理論的解析と実験結果の両方が提案アルゴリズムの有効性と効率を実証している。

関連論文リスト

Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization [0.0]
多要素強化学習(Multi-fidelity Reinforcement Learning, RL)フレームワークは、様々な精度とコストの分析モデルを統合することにより、計算資源を効率的に活用する。本研究では,複数の不均一な非階層的低忠実度モデルを高忠実度モデルとともに動的に活用する適応型多忠実RLフレームワークを提案する。提案手法の有効性はオクトコプター設計最適化問題において実証され,2つの低忠実度モデルと高忠実度シミュレータを用いた。
論文参考訳（メタデータ） (2025-03-23T22:29:08Z)
Practical multi-fidelity machine learning: fusion of deterministic and Bayesian models [0.34592277400656235]
マルチフィデリティ機械学習手法は、少ないリソース集約型高フィデリティデータと、豊富なが精度の低い低フィデリティデータを統合する。低次元領域と高次元領域にまたがる問題に対する実用的多面性戦略を提案する。
論文参考訳（メタデータ） (2024-07-21T10:40:50Z)
Clipped Uniform Quantizers for Communication-Efficient Federated Learning [3.38220960870904]
本稿では,フェデレート学習環境における一様量子化手法を提案する。最適クリッピングしきい値と適応量子化スキームを用いることで、モデル重み伝達のビット要求を著しく削減する。
論文参考訳（メタデータ） (2024-05-22T05:48:25Z)
EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
連続的な学習設定の主な課題は、以前に学習したタスクを実行する方法を忘れずに、タスクのシーケンスを効率的に学習することである。本研究では,モデルの予測力に悪影響を及ぼすことなく,冗長なパラメータを自動生成する,スパースモデル(EsaCL)の効率的な連続学習法を提案する。
論文参考訳（メタデータ） (2024-01-11T04:59:44Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
独立サブネットワークトレーニング(IST)の理論的考察 ISTは、上記の問題を解決するための、最近提案され、非常に効果的である。圧縮通信を用いた分散手法など,ISTと代替手法の基本的な違いを同定する。
論文参考訳（メタデータ） (2023-06-28T18:14:22Z)
Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
本研究では,分散学習におけるモデルアグリゲーションの通信コストの限界について,速度歪みの観点から検討する。 SignSGDでは,ワーカノード間の相関を利用した通信利得が重要であることがわかった。
論文参考訳（メタデータ） (2022-06-28T13:10:40Z)
ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization [111.12063632743013]
本稿では,ClusterQと呼ばれるデータフリーな量子化手法を提案する。意味的特徴のクラス間分離性を高めるために,特徴分布統計をクラスタ化し,整列する。また、クラス内分散を組み込んで、クラスワイドモードの崩壊を解決する。
論文参考訳（メタデータ） (2022-04-30T06:58:56Z)
Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning [75.45968495410047]
クライアントノードと中央集約サーバ間のモデル更新の通信は、連合学習において大きなボトルネックとなる。グラディエント量子化(Gradient Quantization)は、各モデル更新間の通信に必要なビット数を削減する効果的な方法である。通信効率と低エラーフロアを実現することを目的としたAdaFLと呼ばれる適応量子化戦略を提案する。
論文参考訳（メタデータ） (2021-02-08T19:14:21Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
データに依存しない蒸留フレームワークであるMixKDを提案する。妥当な条件下では、MixKDは誤差と経験的誤差の間のギャップを小さくする。限定的なデータ設定とアブレーションによる実験は、提案手法の利点をさらに証明している。
論文参考訳（メタデータ） (2020-11-01T18:47:51Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。