Fugu-MT 論文翻訳(概要): DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

論文の概要: DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

arxiv url: http://arxiv.org/abs/2606.23942v1
Date: Mon, 22 Jun 2026 21:04:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.690498
Title: DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty
Title（参考訳）: DREG:一般罰としての層幅ジャコビアン正規化
Authors: Rowan Martnishn,
Abstract要約: 派生正規化ペナルティ(DREG)の貢献を実証した大規模実証研究を報告する。 4つのアクティベーション、6つのレギュラーライザ、8つのデータセット、5つのランダムシードにまたがる960の実験を網羅して、私たちは、いつ、どこで、なぜDREGが機能するのかを尋ねました。 DREGは、評価された全ての正則化器の中で、総合的およびクリーンレジームの精度が最も高い。スペクトル正規化(SN: Spectral Normalization)は、この研究で唯一2つの層ワイド正規化器である。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We present a large-scale empirical study isolating the contributions of the Derivative Regularization penalty (DREG). Across a fully-crossed factorial sweep of 960 experiments spanning 4 activations, 6 regularizers, 8 datasets, and 5 random seeds, we ask: when, where, and why does DREG work? Our results establish three principal findings. First, DREG achieves the highest overall and clean-regime accuracy among all regularizers evaluated (significantly so against the unregularized baseline, Weight Decay, and IGPen; Wilcoxon $p \leq 0.031$). It ranks second in noise robustness behind Spectral Normalization (SN) - the only two layer-wise regularizers in the study. Second, DREG is globally the best-performing regularizer under GELU, the default activation in modern transformer architectures, particularly on both messy vision and messy NLP benchmarks, suggesting direct applicability to frontier deep learning settings. Third, DREG's advantage over competing regularizers is most pronounced under data scarcity, consistent with its role as a geometric inductive bias that substitutes for the regularizing effect of data volume. Throughout, DREG is applied with a single fixed hyperparameter $λ= 10^{-2.5}$ and no per-dataset tuning, supporting its characterization as a plug-and-play regularizer for neural networks with nontrivial Jacobian structure. These findings are consistent with DREG's design: concentrating regularization pressure on layers where the activation derivative is largest, rather than constraining the network uniformly.
Abstract（参考訳）: 本稿では,DREG(デリバティブ・レギュラライゼーション・ペナルティ)の貢献を実証した大規模な実証研究について紹介する。 4つのアクティベーション、6つのレギュラーライザ、8つのデータセット、5つのランダムシードにまたがる960の実験を網羅して、私たちは、いつ、どこで、なぜDREGが機能するのかを尋ねました。結果から3つの主要な所見が得られた。まず、DREGは評価されたすべての正則化器の中で最も総合的かつクリーンな登録精度を達成している(特に、正規化されていないベースラインであるWeight Decay と IGPen に対して、Wilcoxon $p \leq 0.031$)。スペクトル正規化(SN: Spectral Normalization)は、この研究で唯一2つの層ワイド正規化器である。第二に、DREGはGELUの下では世界で最高のパフォーマンスのレギュレータであり、現代のトランスフォーマーアーキテクチャではデフォルトのアクティベーションであり、特に乱雑なビジョンと乱雑なNLPベンチマークの両方で有効であり、フロンティアのディープラーニング設定への直接的な適用性を示している。第3に、競合する正規化器に対するDREGのアドバンテージは、データボリュームの正規化効果の代わりに幾何学的帰納バイアスとしての役割と整合して、データ不足下で最も顕著である。 DREGは1つの固定されたハイパーパラメータ$λ=10^{-2.5}$で適用され、データセットごとのチューニングは行わず、非自明なジャコビアン構造を持つニューラルネットワークのプラグアンドプレイ正規化器として特徴づけられる。これらの結果はDREGの設計と一致しており、ネットワークを一様に拘束するのではなく、活性化微分が最大となる層に正規化圧力を集中させることである。

論文の概要: DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

関連論文リスト