Fugu-MT 論文翻訳(概要): Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage

論文の概要: Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage

arxiv url: http://arxiv.org/abs/2508.16905v1
Date: Sat, 23 Aug 2025 05:38:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.245194
Title: Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage
Title（参考訳）: Tri-Accel: 効率的なGPU利用のための曲率適応型およびメモリ-弾性最適化
Authors: Mohsen Sheibanian, Pouya Shaeri, Alimohammad Beigi, Ryan T. Woo, Aryan Keluskar,
Abstract要約: Tri-Accelは3つのアクセラレーション戦略と、トレーニング中の適応パラメータを併用する統合最適化フレームワークである。 ResNet-18とEfficientNet-B0を搭載したCIFAR-10では、Tri-Accelはトレーニング時間の最大9.9%削減とメモリ使用量の13.3%削減を実現している。静的混合精度トレーニングと比較して、Tri-Accelは78.1%の精度を維持し、標準ハードウェアのメモリフットプリントを0.35GBから0.31GBに削減している。
参考スコア（独自算出の注目度）: 0.6511750267058007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks are increasingly bottlenecked by the cost of optimization, both in terms of GPU memory and compute time. Existing acceleration techniques, such as mixed precision, second-order methods, and batch size scaling, are typically used in isolation. We present Tri-Accel, a unified optimization framework that co-adapts three acceleration strategies along with adaptive parameters during training: (1) Precision-Adaptive Updates that dynamically assign mixed-precision levels to layers based on curvature and gradient variance; (2) Sparse Second-Order Signals that exploit Hessian/Fisher sparsity patterns to guide precision and step size decisions; and (3) Memory-Elastic Batch Scaling that adjusts batch size in real time according to VRAM availability. On CIFAR-10 with ResNet-18 and EfficientNet-B0, Tri-Accel achieves up to 9.9% reduction in training time and 13.3% lower memory usage, while improving accuracy by +1.1 percentage points over FP32 baselines. Tested on CIFAR-10/100, our approach demonstrates adaptive learning behavior, with efficiency gradually improving over the course of training as the system learns to allocate resources more effectively. Compared to static mixed-precision training, Tri-Accel maintains 78.1% accuracy while reducing memory footprint from 0.35GB to 0.31GB on standard hardware. The framework is implemented with custom Triton kernels, whose hardware-aware adaptation enables automatic optimization without manual hyperparameter tuning, making it practical for deployment across diverse computational environments. This work demonstrates how algorithmic adaptivity and hardware awareness can be combined to improve scalability in resource-constrained settings, paving the way for more efficient neural network training on edge devices and cost-sensitive cloud deployments.
Abstract（参考訳）: ディープニューラルネットワークは、GPUメモリと計算時間の両方において、最適化のコストによって、ますますボトルネックになっている。既存の加速技術、例えば混合精度、二階法、バッチサイズスケーリングは、通常孤立して使用される。トレーニング中の3つのアクセラレーション戦略と適応パラメータを併用する統合最適化フレームワークであるTri-Accelについて,(1)曲率と勾配の分散に基づく層への混合精度レベルを動的に割り当てる精度適応型更新,(2)ヘシアン/フィッシャーの間隔パターンを利用して精度とステップサイズの決定を導出するスパース2次信号,(3)VRAMの可用性に応じてバッチサイズをリアルタイムで調整するメモリ-Elastic Batch Scalingを提案する。 ResNet-18とEfficientNet-B0のCIFAR-10では、トレーニング時間の最大9.9%削減とメモリ使用量の13.3%削減を実現し、FP32ベースラインよりも+1.1ポイントの精度向上を実現している。 CIFAR-10/100をベースとして,本手法は適応的な学習行動を示す。静的混合精度トレーニングと比較して、Tri-Accelは78.1%の精度を維持し、標準ハードウェアのメモリフットプリントを0.35GBから0.31GBに削減している。このフレームワークはカスタムのTritonカーネルで実装されており、ハードウェアを意識した適応により、手動のハイパーパラメータチューニングなしで自動最適化が可能であり、様々な計算環境をまたがるデプロイに実用的である。この研究は、アルゴリズムの適応性とハードウェアの認識を組み合わせることで、リソース制約のある設定のスケーラビリティを向上し、エッジデバイス上でのより効率的なニューラルネットワークトレーニングとコストに敏感なクラウドデプロイメントを実現する方法を示している。

論文の概要: Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage

関連論文リスト