Fugu-MT 論文翻訳(概要): Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications

論文の概要: Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications

arxiv url: http://arxiv.org/abs/2112.09780v1
Date: Fri, 17 Dec 2021 21:51:34 GMT
ステータス: 翻訳完了
システム内更新日: 2021-12-23 06:44:55.797846
Title: Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications
Title（参考訳）: 深層学習アプリケーションのユーザビリティに及ぼす仮想化の影響を探る
Authors: Davood G. Samani, Mohsen Amini Salehi
Abstract要約: 本研究では,4種類のディープラーニングアプリケーションのE2E推論時間に及ぼす4つの一般的な実行プラットフォームの影響を計測する。注目すべき発見は、ソリューションアーキテクトがDLアプリケーションの特性を認識しなければならないことである。
参考スコア（独自算出の注目度）: 1.527276935569975
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Learning-based (DL) applications are becoming increasingly popular and advancing at an unprecedented pace. While many research works are being undertaken to enhance Deep Neural Networks (DNN) -- the centerpiece of DL applications -- practical deployment challenges of these applications in the Cloud and Edge systems, and their impact on the usability of the applications have not been sufficiently investigated. In particular, the impact of deploying different virtualization platforms, offered by the Cloud and Edge, on the usability of DL applications (in terms of the End-to-End (E2E) inference time) has remained an open question. Importantly, resource elasticity (by means of scale-up), CPU pinning, and processor type (CPU vs GPU) configurations have shown to be influential on the virtualization overhead. Accordingly, the goal of this research is to study the impact of these potentially decisive deployment options on the E2E performance, thus, usability of the DL applications. To that end, we measure the impact of four popular execution platforms (namely, bare-metal, virtual machine (VM), container, and container in VM) on the E2E inference time of four types of DL applications, upon changing processor configuration (scale-up, CPU pinning) and processor types. This study reveals a set of interesting and sometimes counter-intuitive findings that can be used as best practices by Cloud solution architects to efficiently deploy DL applications in various systems. The notable finding is that the solution architects must be aware of the DL application characteristics, particularly, their pre- and post-processing requirements, to be able to optimally choose and configure an execution platform, determine the use of GPU, and decide the efficient scale-up range.
Abstract（参考訳）: ディープラーニングベースの(DL)アプリケーションは,前例のないペースで普及し,進歩している。 dlアプリケーションの中心となるディープニューラルネットワーク(dnn)を強化するために多くの研究が進められているが、クラウドやエッジシステムにおけるこれらのアプリケーションの実用的なデプロイ課題とそのアプリケーションのユーザビリティへの影響は十分に調査されていない。特に、クラウドとエッジが提供するさまざまな仮想化プラットフォームのデプロイが、DLアプリケーションのユーザビリティ(End-to-End(E2E)推論時間の観点から)に与える影響は、未解決のままである。重要なのは、リソースの弾力性(スケールアップによる)、CPUピンニング、プロセッサタイプ(CPU対GPU)の設定が仮想化のオーバーヘッドに影響を与えていることである。したがって、本研究の目的は、これらの潜在的に決定的なデプロイメントオプションがE2Eのパフォーマンスに与える影響を調べることである。そのため、プロセッサ構成(スケールアップ、CPUピンニング)やプロセッサタイプを変更する際に、一般的な4つの実行プラットフォーム(ベアメタル、仮想マシン(VM)、コンテナ、VM内のコンテナ)が4種類のDLアプリケーションのE2E推論時間に与える影響を測定する。この研究は、クラウドソリューションアーキテクトがさまざまなシステムにDLアプリケーションを効率的にデプロイするためのベストプラクティスとして使用できる興味深い、時には反直感的な発見の集合を明らかにします。注目すべき発見は、ソリューションアーキテクトがDLアプリケーションの特徴、特に前処理と後処理の要件を認識して、実行プラットフォームを最適に選択して設定し、GPUの使用を判断し、効率的なスケールアップ範囲を決定する必要があることである。

関連論文リスト

Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts [0.0]
機械学習アルゴリズムはますます価値のあるツールであることが示されている。加速器なしでより大きなモデルで推論を実行することは不可能である。キャッシュアーキテクチャの変更により,CPUのみの環境におけるLCMの高速化を目指す。
論文参考訳（メタデータ） (2025-06-02T16:12:22Z)
Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications [22.053978157017877]
本稿では,SLM(Small Language Model)を訓練する手法と知見について述べる。本研究では,(1) 知識蒸留と(2) 量子化とプルーニングによるモデル圧縮の2つの重要な技術に焦点をあてる。大規模なプロフェッショナルなソーシャルネットワークプラットフォームにおけるさまざまなユースケースに対するこれらのテクニックの影響を詳述し、デプロイメントのレッスンを共有します。
論文参考訳（メタデータ） (2025-02-20T06:40:12Z)
Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls [22.49750818224266]
リアルタイムなインテリジェントなアプリケーションのために、リソース制約のあるモバイルデバイスに計算集約型ディープラーニング(DL)モデルをデプロイする需要が高まっている。モバイルデバイスは、異種プロセッサ間の並列実行を通じてDL推論を加速する可能性を秘めている。本稿では、異種モバイルプロセッサ上での並列DL推論に関連する機能と課題を評価するための総合的研究について述べる。
論文参考訳（メタデータ） (2024-05-03T04:47:23Z)
Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures [56.200335252600354]
トレーニング済みのモデルを、ネイティブな開発環境とは異なる環境にデプロイするのは、一般的なプラクティスです。これにより、インフラを含むONNXや標準フォーマットとして機能するONNXなどの交換フォーマットが導入された。
論文参考訳（メタデータ） (2024-02-21T09:18:44Z)
etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices [47.365775210055396]
推論精度、微調整実行時間、エネルギー効率を最適化する効率的なエッジ連続学習フレームワークであるETunerを提案する。実験結果から,ETunerは全体の微調整実行時間を64%削減し,エネルギー消費量を56%削減し,即時モデル微調整アプローチよりも平均推定精度を1.75%向上した。
論文参考訳（メタデータ） (2024-01-30T02:41:05Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
この研究は、効率的でポータブルなDeep LearningとHigh Performance Computingカーネルを開発するためのフレームワークを導入している。 1)プロセッシングプリミティブ(TPP)を用いた計算コアの表現と,2)高レベルな宣言的手法でTPPのまわりの論理ループの表現の2つのステップでカーネルの開発を分解する。我々は、スタンドアロンカーネルと、さまざまなCPUプラットフォームにおける最先端実装よりも優れたエンドツーエンドワークロードを使用して、このアプローチの有効性を実証する。
論文参考訳（メタデータ） (2023-04-25T05:04:44Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
Adapter-ALBERTは、様々なタスクにわたる最大データ再利用のための効率的なモデル最適化である。検証されたNLPエッジアクセラレータ上でシミュレーションを行うことにより、モデルを不均一なオンチップメモリアーキテクチャにマッピングする利点を実証する。
論文参考訳（メタデータ） (2023-03-25T14:40:59Z)
Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge [10.067877168224337]
本研究は,ディープラーニングアプリケーションのレイテンシ制約を満たすため,メモリ競合を克服することを目的としている。 We propose a efficient NN model management framework called Edge-MultiAI, which uses the NN model of the DL application into the edge memory。 We show that Edge-MultiAI can encourage the degree of multi-tenancy on the edge by least 2X and the number of warm-starts by around 60% by any significant loss on the inference accuracy of the application。
論文参考訳（メタデータ） (2022-11-14T06:17:32Z)
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
Processing-in-Memory(PIM)は、現代のアプリケーションにおけるデータ移動のボトルネックを軽減する、有望な実行パラダイムである。本稿では,2つの現代的なデータ集約型アプリケーションに対して,PIMパラダイムの活用方法を示す。
論文参考訳（メタデータ） (2022-05-29T13:43:17Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
本稿では,基本的なクロスプラットフォームテンソルフレームワークとスクリプト言語エンジンを使用しながら,すべての要件をサポートする統合デプロイメントパイプラインとフリー・ツー・オペレートアプローチを提案する。しかし、このアプローチは、実際のプロダクショングレードシステムに機械学習機能を実際にデプロイするために必要な手順やパイプラインを提供していない。
論文参考訳（メタデータ） (2021-12-22T14:45:37Z)
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum [55.6313942302582]
エッジ・ツー・クラウド・コンティニュム上でのリアルタイムアプリケーションの最適化を支援する手法を提案する。提案手法は, 制御されたテストベッド環境において, その動作を理解するための厳密な構成解析に頼っている。当社の方法論はEdge-to-Cloud Continuumの他のアプリケーションに一般化することができる。
論文参考訳（メタデータ） (2021-08-04T07:35:14Z)
OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices [5.522962791793502]
OODInは、異種モバイルデバイスにまたがるディープラーニングアプリの最適化されたデプロイのためのフレームワークである。デバイスリソースとDLモデルのばらつきを、非常にパラメトリドな多層設計によって対処する。高度に最適化されたプラットフォームおよびモデル対応設計よりも最大4.3倍、3.5倍の性能向上を実現している。
論文参考訳（メタデータ） (2021-06-08T22:38:18Z)
Optimising Resource Management for Embedded Machine Learning [23.00896228073755]
機械学習推論は、モバイルおよび組み込みプラットフォーム上でローカルに実行されつつある。異種マルチコアシステムにおけるオンラインリソース管理手法を提案する。
論文参考訳（メタデータ） (2021-05-08T06:10:05Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。