Fugu-MT 論文翻訳(概要): Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields

論文の概要: Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields

arxiv url: http://arxiv.org/abs/2510.23621v1
Date: Thu, 23 Oct 2025 14:02:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.288963
Title: Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields
Title（参考訳）: MACEの高速化: 等価力場のための低精度トリック
Authors: Alexandre Benoit,
Abstract要約: 機械学習力場は高い計算コストで正確な分子動力学(MD)を提供することができる。この論文は、計算ボトルネックを特定し、低精度の実行ポリシーを評価することで、MACEを安価かつ高速にすることを目的としている。
参考スコア（独自算出の注目度）: 51.95157731126864
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Machine-learning force fields can deliver accurate molecular dynamics (MD) at high computational cost. For SO(3)-equivariant models such as MACE, there is little systematic evidence on whether reduced-precision arithmetic and GPU-optimized kernels can cut this cost without harming physical fidelity. This thesis aims to make MACE cheaper and faster while preserving accuracy by identifying computational bottlenecks and evaluating low-precision execution policies. We profile MACE end-to-end and per block, compare the e3nn and NVIDIA cuEquivariance backends, and assess FP64/FP32/BF16/FP16 settings (with FP32 accumulation) for inference, short NVT and long NPT water simulations, and toy training runs under reproducible, steady-state timing. cuEquivariance reduces inference latency by about $3\times$. Casting only linear layers to BF16/FP16 within an FP32 model yields roughly 4x additional speedups, while energies and thermodynamic observables in NVT/NPT MD remain within run-to-run variability. Half-precision weights during training degrade force RMSE. Mixing e3nn and cuEq modules without explicit adapters causes representation mismatches. Fused equivariant kernels and mixed-precision inference can substantially accelerate state-of-the-art force fields with negligible impact on downstream MD. A practical policy is to use cuEquivariance with FP32 by default and enable BF16/FP16 for linear layers (keeping FP32 accumulations) for maximum throughput, while training remains in FP32. Further gains are expected on Ampere/Hopper GPUs (TF32/BF16) and from kernel-level FP16/BF16 paths and pipeline fusion.
Abstract（参考訳）: 機械学習力場は高い計算コストで正確な分子動力学(MD)を提供することができる。 MACEのようなSO(3)同変モデルでは、削減精度算術とGPU最適化カーネルが物理的忠実さを損なうことなくこのコストを削減できるかどうかという体系的な証拠はほとんどない。この論文は、計算ボトルネックを特定し、低精度の実行ポリシーを評価することによって、精度を保ちながら、MACEを安価かつ高速にすることを目的としている。 We profile MACE end-to-end and per block, compare the e3nn and NVIDIA cuEquivariance backends, and evaluation FP64/FP32/BF16/FP16 settings for inference, short NVT and long NPT water Simulation, and toy training run under reproducible, steady-state timing。 cuEquivarianceは推論遅延を約$3\times$に減らす。 FP32モデル内の線形層のみをBF16/FP16にキャストすると、約4倍のスピードアップが得られるが、NVT/NPT MDのエネルギーと熱力学的観測値は実行時変動に留まる。練習用脱格力RMSEの半精度重量。 e3nnとcuEqモジュールを明示的なアダプタなしで混合すると、表現ミスマッチが発生する。融合同変カーネルと混合精度推論は、下流MDに無視できる影響で、最先端の力場を著しく加速させることができる。 FP32とcuEquivarianceをデフォルトで使用し、最大スループットのために線形層(FP32の蓄積を保持する)に対してBF16/FP16を有効にし、トレーニングはFP32に留まる。 Ampere/Hopper GPU(TF32/BF16)とカーネルレベルのFP16/BF16パスとパイプラインの融合により、さらなる利益が期待できる。

論文の概要: Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields

関連論文リスト