Fugu-MT 論文翻訳(概要): SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

論文の概要: SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

arxiv url: http://arxiv.org/abs/2605.10989v2
Date: Fri, 15 May 2026 04:30:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 03:45:13.105507
Title: SURGE: Surrogate Gradient Adaptation in Binary Neural Networks
Title（参考訳）: SURGE: 2元ニューラルネットワークにおけるサロゲート勾配適応
Authors: Haoyu Huang, Boyu Liu, Linlin Yang, Yanjing Li, Yuguang Yang, Xuhui Liu, Canyu Chen, Zhongqian Fu, Baochang Zhang,
Abstract要約: SURGE(SURrogate GradiEnt Adaptation)は,理論的な基礎を持つ学習可能な勾配補償フレームワークである。 SURGEは補助的なバックプロパゲーションを通じて勾配ミスマッチを緩和する。画像分類、オブジェクト検出、言語理解タスクの実験は、SURGEが最先端の手法よりも優れていることを示す。
参考スコア（独自算出の注目度）: 41.424349870612716
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Through Estimator (STE) and its improved variants, rely on hand-crafted designs that suffer from gradient mismatch problem and information loss induced by fixed-range gradient clipping. To address this, we propose SURrogate GradiEnt Adaptation (SURGE), a novel learnable gradient compensation framework with theoretical grounding. SURGE mitigates gradient mismatch through auxiliary backpropagation. Specifically, we design a Dual-Path Gradient Compensator (DPGC) that constructs a parallel full-precision auxiliary branch for each binarized layer, decoupling gradient flow via output decomposition during backpropagation. DPGC enables bias-reduced gradient estimation by leveraging the full-precision branch to estimate components beyond STE's first-order approximation. To further enhance training stability, we introduce an Adaptive Gradient Scaler (AGS) based on an optimal scale factor to dynamically balance inter-branch gradient contributions via norm-based scaling. Experiments on image classification, object detection, and language understanding tasks demonstrate that SURGE performs best over state-of-the-art methods.
Abstract（参考訳）: バイナリニューラルネットワーク(BNN)のトレーニングは、基本的には微分不可能な二項化演算(例えば手話関数)の勾配近似に基づいている。しかし、STE(Straight-Through Estimator)とその改良型を含む一般的な手法は、勾配ミスマッチ問題や固定レンジ勾配クリッピングによって引き起こされる情報損失に苦しむ手作りの設計に依存している。そこで我々はSURGE(SURrogate GradiEnt Adaptation)を提案する。 SURGEは補助的なバックプロパゲーションを通じて勾配ミスマッチを緩和する。具体的には、バックプロパゲーション中に出力分解によって勾配流を分離し、各二項化層に対して並列な完全精度補助枝を構成するDual-Path Gradient Compensator (DPGC) を設計する。 DPGCは、STEの1次近似を超える成分を推定するために全精度分岐を利用することにより、バイアス低減勾配推定を可能にする。トレーニング安定性をさらに高めるために,適応勾配尺度 (Adaptive Gradient Scaler, AGS) を導入する。画像分類、オブジェクト検出、言語理解タスクの実験は、SURGEが最先端の手法よりも優れていることを示す。

論文の概要: SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

関連論文リスト