Fugu-MT 論文翻訳(概要): BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"

論文の概要: BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"

arxiv url: http://arxiv.org/abs/2604.16324v1
Date: Thu, 05 Mar 2026 20:38:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:13.888456
Title: BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"
Title（参考訳）: BASIS:"ゴーストバックプロパゲーション"のための不変スケーラとのバランスの取れたアクティベーションスケッチ
Authors: Vladimer Khasia,
Abstract要約: 正確なバックプロパゲーションに必要な活性化メモリは、ネットワーク深さ、コンテキスト長、特徴次元と線形にスケールする。本稿では,活性化メモリをバッチ次元とシーケンス次元から完全に分離する効率的なバックプロパゲーションアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The activation memory required for exact backpropagation scales linearly with network depth, context length, and feature dimensionality, forming an O(L * BN ) spatial bottleneck (where B is the sequence-batch cardinality and N is the feature dimension). This constraint historically throttles the scaling of deep neural networks. While randomized automatic differentiation attempts to mitigate this, it historically suffers from catastrophic variance. In this paper, we introduce BASIS (Balanced Activation Sketching with Invariant Scalars), an efficient backpropagation algorithm that fully decouples activation memory from the batch and sequence dimensions. BASIS propagates the exact error signal (dX) to preserve flawless gradient flow, but computes the weight updates (dW) using massively compressed rank-R tensors. To solve the foundational instability of sketched gradients, we propose two novel mechanisms: Balanced Hashing, which strictly eliminates off-diagonal collision variance, and Invariant Scalars, a principled bias-variance tradeoff that deterministically preserves the exact continuous energy norm of the spatial geometry. Theoretically, BASIS reduces activation memory to O(L * RN ) and heavily decreases the backward pass matrix-multiplication footprint. Empirically, training a GPT architecture for 50,000 steps validates our theoretical guarantees: at R = 32, BASIS achieves parity with (and marginally outperforms) exact backpropagation validation loss (6.575 vs. 6.616), acting as an implicit regularizer. Remarkably, the stabilized magnitude trajectory allows the model to converge smoothly even under extreme spatial compression (R = 1), proving the extreme robustness of the estimator. The code is available at https://github.com/VladimerKhasia/basis
Abstract（参考訳）: 正確なバックプロパゲーションに必要なアクティベーションメモリは、ネットワーク深さ、コンテキスト長、特徴次元と線形にスケールし、O(L * BN)空間ボトルネックを形成する(Bはシーケンスバッチ濃度、Nは特徴次元)。この制約は、歴史的にディープニューラルネットワークのスケーリングを妨げている。ランダム化された自動微分は、これを緩和しようとするが、歴史的に破滅的な分散に悩まされている。本稿では、バッチとシーケンス次元からアクティベーションメモリを完全に分離する効率的なバックプロパゲーションアルゴリズムであるBASIS(Balanced Activation Sketching with Invariant Scalars)を紹介する。 BASISは、正確なエラー信号(dX)を伝播して、欠陥のない勾配流を保存するが、巨大な圧縮されたランクRテンソルを用いて重み更新(dW)を計算する。スケッチされた勾配の基本的な不安定性を解決するために、対角線外衝突の分散を厳密に排除するバランスド・ハッシュと、空間幾何学の正確な連続エネルギーノルムを決定論的に保存する原理的バイアス分散トレードオフである不変スカラーズという2つの新しいメカニズムを提案する。理論的には、BASISは活性化メモリをO(L * RN )に還元し、後方通過行列-乗算フットプリントを著しく減少させる。 R = 32 では、BASIS は正確なバックプロパゲーション検証損失 (6.575 vs. 6.616) と同等(および限界的に優れる)に達し、暗黙の正則化として機能する。注目すべきは、安定度軌跡により、モデルが極端空間圧縮(R = 1)の下でも滑らかに収束し、推定器の極端な堅牢性を証明することである。コードはhttps://github.com/VladimerKhasia/basisで公開されている。

論文の概要: BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"

関連論文リスト