Fugu-MT 論文翻訳(概要): Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

論文の概要: Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

arxiv url: http://arxiv.org/abs/2509.23898v1
Date: Sun, 28 Sep 2025 14:08:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.518242
Title: Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization
Title（参考訳）: D$-Gatingによる差別化可能な分散性:単純で可逆な構造的罰則
Authors: Chris Kolb, Laetitia Frost, Bernd Bischl, David Rügamer,
Abstract要約: D$-Gatingは、理論上、元の群疎性問題の解法と等価であることを示す。ビジョン、言語、タスクにまたがって私たちの理論を検証する。
参考スコア（独自算出の注目度）: 22.883367233817836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose $D$-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under $D$-Gating is also a local minimum using non-smooth structured $L_{2,2/D}$ penalization, and further show that the $D$-Gating objective converges at least exponentially fast to the $L_{2,2/D}$-regularized loss in the gradient flow limit. Together, our results show that $D$-Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where $D$-Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.
Abstract（参考訳）: 構造化されたスパーシリティ正規化は、コンパクトなニューラルネットワークに原則化された方法を提供するが、その非微分性は、従来の確率勾配勾配との互換性を損なうため、特別なオプティマイザか、正式な保証のない追加のポストホックプルーニングが必要である。本研究では,重みの群を1次重みベクトルと複数のスカラーゲーティング因子に分割する,完全に微分可能なオーバーパラメータ化法であるD$-Gatingを提案する。我々は、$D$-Gating の任意の局所最小値が非滑らかな構造を持つ$L_{2,2/D}$ペナル化を用いて局所最小値であることが証明し、さらに、$D$-Gating の目標は、勾配フロー極限における$L_{2,2/D}$-正則化損失に少なくとも指数関数的に早く収束することを示す。その結果、D$-Gatingは、理論上は元のグループ空間問題の解法と等価であるが、非スパースな状態からスパースな最適化へと進化する異なる学習力学を誘導することを示した。 D$-Gatingは高いパフォーマンスとスパーシティのトレードオフを一貫して提供し、構造化されたペナルティの直接的な最適化と従来のプルーニングベースラインの両方を上回ります。

関連論文リスト

FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
低ランク適応(LoRA)は、フェデレートラーニング(FL)における言語モデルの効率的な微調整に広く用いられている。低ランク適応(LoRA)は、フェデレートラーニング(FL)における言語モデルの効率的な微調整に広く用いられている。
論文参考訳（メタデータ） (2025-05-19T07:32:56Z)
Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
フェデレートされた学習は、参加者のプライバシを備えた機械学習モデルを可能にする。トレーニングやフィードバックのない問題に対して、差分にプライベートな分散手法は存在しない。証明可能な収束保証付き分散アルゴリズム$alpha$-$sf NormEC$を導入する。
論文参考訳（メタデータ） (2025-02-19T07:10:32Z)
MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論解析を伴う効果的なアルゴリズムを提供しているが、それらは標準の$L$-smoothあるいは有界勾配仮定によって制限されている。一般化された$ell$-smooth損失関数のより一般的で現実的なクラスについて研究し、$ell$は勾配ノルムの一般非減少関数である。
論文参考訳（メタデータ） (2024-05-29T18:36:59Z)
Decoupled Weight Decay for Any $p$ Norm [1.1510009152620668]
トレーニング中の正規化に$L_p$のブリッジをベースとした,スパーシフィケーションに対する単純かつ効果的なアプローチを検討する。我々は、標準の$L$重み崩壊を任意の$p$ノルムに一般化する新しい重み崩壊スキームを導入する。標準的な$L$正規化に匹敵する性能を維持しながら、非常に疎結合なネットワークにつながることを実証的に実証した。
論文参考訳（メタデータ） (2024-04-16T18:02:15Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
本稿では,ニューラルネットワークトレーニングを安定化(大規模)するための原理的手法として,線形アヘッドの理論解析を提案する。最適化過程の不安定性は、しばしば損失ランドスケープの非単調性によって引き起こされるものであり、非拡張作用素の理論を活用することによって線型性がいかに役立つかを示す。
論文参考訳（メタデータ） (2023-10-20T12:45:12Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
本稿では,2段階の適応性を持つオンライン凸最適化手法を提案する。我々は$mathcalO(log V_T)$, $mathcalO(d log V_T)$, $hatmathcalO(sqrtV_T)$ regret bounds for strong convex, exp-concave and convex loss function。
論文参考訳（メタデータ） (2023-07-17T09:55:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。