Fugu-MT 論文翻訳(概要): eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing

論文の概要: eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing

arxiv url: http://arxiv.org/abs/2508.10370v1
Date: Thu, 14 Aug 2025 06:08:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.198368
Title: eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing
Title（参考訳）: eMamba: エッジコンピューティングにおけるMambaモデルの効率的な高速化フレームワーク
Authors: Jiyong Kim, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, Jaehyun Park,
Abstract要約: State Space Model (SSM)ベースの機械学習アーキテクチャは、最近、シーケンシャルデータを処理する上で大きな注目を集めている。 eMambaは、エッジプラットフォームにMambaモデルをデプロイするために特別に設計された、包括的なエンドツーエンドハードウェアアクセラレーションフレームワークである。我々はeMambaが1.63-19.9$times$ lessパラメータを使って最先端技術に匹敵する精度を達成することを示した。
参考スコア（独自算出の注目度）: 14.932572899503935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State Space Model (SSM)-based machine learning architectures have recently gained significant attention for processing sequential data. Mamba, a recent sequence-to-sequence SSM, offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. This paper presents eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. eMamba maximizes computational efficiency by replacing complex normalization layers with lightweight hardware-aware alternatives and approximating expensive operations, such as SiLU activation and exponentiation, considering the target applications. Then, it performs an approximation-aware neural architecture search (NAS) to tune the learnable parameters used during approximation. Evaluations with Fashion-MNIST, CIFAR-10, and MARS, an open-source human pose estimation dataset, show eMamba achieves comparable accuracy to state-of-the-art techniques using 1.63-19.9$\times$ fewer parameters. In addition, it generalizes well to large-scale natural language tasks, demonstrating stable perplexity across varying sequence lengths on the WikiText2 dataset. We also quantize and implement the entire eMamba pipeline on an AMD ZCU102 FPGA and ASIC using GlobalFoundries (GF) 22 nm technology. Experimental results show 4.95-5.62$\times$ lower latency and 2.22-9.95$\times$ higher throughput, with 4.77$\times$ smaller area, 9.84$\times$ lower power, and 48.6$\times$ lower energy consumption than baseline solutions while maintaining competitive accuracy.
Abstract（参考訳）: State Space Model (SSM)ベースの機械学習アーキテクチャは、最近、シーケンシャルデータを処理する上で大きな注目を集めている。最近のシーケンス・ツー・シーケンスのSSMであるMambaは、最先端のトランスフォーマーモデルと比較して、計算効率に優れた競合精度を提供する。この利点により、Mambaはリソース制約のあるエッジデバイスに特に期待できるが、ハードウェアアクセラレーションフレームワークは現在、そのような環境にデプロイするために最適化されていない。本稿では,エンド・ツー・エンドのハードウェアアクセラレーションフレームワークであるeMambaについて述べる。 eMambaは、複雑な正規化レイヤを軽量なハードウェア対応の代替に置き換え、SiLUアクティベーションや指数化といった高価な操作を対象のアプリケーションを考慮して近似することで、計算効率を最大化する。そして、近似対応ニューラルアーキテクチャサーチ(NAS)を行い、近似中に使用される学習可能なパラメータをチューニングする。オープンソースの人間のポーズ推定データセットであるFashion-MNIST、CIFAR-10、MARSによる評価は、eMambaが1.63-19.9$\times$ lessパラメータを使用して最先端技術に匹敵する精度を達成していることを示している。さらに、大規模な自然言語タスクを一般化し、WikiText2データセット上で、さまざまなシーケンスの長さにわたる安定したパープレキシティを実証する。我々はまた、GlobalFoundries (GF) 22nm技術を用いて、AMD ZCU102FPGAとASIC上でeMambaパイプライン全体を量子化し、実装する。実験の結果、低レイテンシで4.95-5.62$\times$2.22-9.95$\times$高スループットで、4.77$\times$小面積で9.84$\times$低消費電力で48.6$低消費電力で競争精度を維持した。

論文の概要: eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing

関連論文リスト