Fugu-MT 論文翻訳(概要): Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

論文の概要: Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

arxiv url: http://arxiv.org/abs/2606.19036v1
Date: Wed, 17 Jun 2026 13:06:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.169834
Title: Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts
Title（参考訳）: Sparse Mixture-of-Expertsにおける不連続の幾何学的および確率論的解析
Authors: Tho Tran Huu, Huu-Tuan Nguyen, Thien-Hai Nguyen, Nhat-Tri Ho, Viet-Hoang Tran, Tho Quan, Tan Minh Nguyen,
Abstract要約: SMOE(Sparse Mixture-of-Experts)アーキテクチャは現在、言語とビジョンモデルに広くデプロイされている。条件付きルーティングを可能にするエキスパート選択のトップ$kは、本質的に不連続なSMoEマップをレンダリングする。低次不連続集合が支配的であるのに対し、高次集合は消えるほど小さな相対体積を占める。本稿では,既存のSMoEに適用可能な簡易な平滑化機構を提案する。
参考スコア（独自算出の注目度）: 15.710271326191219
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Mixture-of-Experts (SMoE) architectures are now widely deployed in state-of-the-art language and vision models, where conditional routing allows scaling to very large networks. However, this very Top-$k$ expert selection that enables conditional routing also renders the SMoE map inherently discontinuous. In the vicinity of these discontinuity surfaces, even inputs that are arbitrarily close may activate substantially different sets of experts resulting in significantly different outputs. In this work we give a rigorous geometric and stochastic analysis of these discontinuities. We first classify them by order, determined by the number of tied experts at a switching event. Using measure-theoretic slicing arguments, we establish asymptotic volume estimates for the thickened discontinuity surfaces, showing that lower-order discontinuity sets dominate, whereas higher-order ones occupy a vanishingly small relative volume. Next, modeling random perturbations in the input space via a diffusion process, we prove that the path eventually encounter a discontinuity, and moreover that the first hit almost surely occurs on an order-1 discontinuity with explicit finite-time probability bounds. We further derive occupation-time bounds that quantify the duration the random path spend in the neighborhoods of each discontinuity order. These theoretical results imply that inputs are more likely to lie near lower order discontinuities. Motivated by this insight, we propose a simple smoothing mechanism that can be directly applied to existing SMoEs, softly incorporating experts near discontinuities; our analysis guarantees that the added computational overhead remains small while providing localized smoothing near discontinuities, and experiments across language and vision tasks show that smoothing not only enforces continuity of the SMoE map but also enhances empirical performance.
Abstract（参考訳）: SMOE(Sparse Mixture-of-Experts)アーキテクチャは、現状の言語とビジョンモデルに広くデプロイされている。しかし、条件付きルーティングを可能にするこのTop-k$のエキスパートセレクションは、本質的に不連続なSMoEマップをレンダリングする。これらの不連続面の近傍では、任意に近接している入力でさえ、実質的に異なる専門家の集合を活性化し、結果として出力は大きく異なる。この研究では、これらの不連続性の厳密な幾何学的および確率的な解析を行う。まず、スイッチングイベントにおいて、結びついた専門家の数によって決定され、順番で分類します。測度論的スライシングの議論を用いて、肥大化した不連続面の漸近体積推定を定め、低次の不連続集合が支配的であるのに対し、高次のものは消滅するほど小さな相対体積を占めることを示す。次に、拡散過程を通じて入力空間におけるランダムな摂動をモデル化し、経路が最終的に不連続に遭遇すること、さらに、第1のヒットが明示的な有限時間確率境界を持つオーダー-1不連続でほぼ確実に発生することを証明する。さらに、各不連続順序の近傍でランダムパスが消費する時間を定量化する職業時間境界を導出する。これらの理論的結果は、入力は低次不連続に近い傾向にあることを示唆している。この知見により,既存のSMoEに直接適用可能な簡易な平滑化機構を提案し,不連続点に近い専門家をソフトに取り入れること,不連続点に近い局所的平滑化を提供しながら計算オーバーヘッドが小さいこと,SMoEマップの連続性を強制するだけでなく,経験的性能も向上することを示す。

関連論文リスト

Fitting Unknown Number of Hyperplanes with Manifold Optimization [57.48093263119306]
未知数の線形平面をデータに適合させることは、機械学習の根本的な課題である。既存のアプローチはしばしば最適な最適化に苦しむか、幾何的整合性に欠ける。
論文参考訳（メタデータ） (2026-05-27T14:02:20Z)
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics [92.39053980710702]
グラディエント・Descent (SGD) は通常ランゲヴィン過程としてモデル化され、ミニバッチノイズがブラウン運動として働くと仮定される。この近似は、連続時間制限と、離散的なSGD更新を有限学習率で一致しないsqrt(eta)ノイズスケーリングに依存している。ミニバッチサンプリングによって誘導されるゆらぎのある損失景観における決定論的力学としてのSGDの別の定式化を提案する。
論文参考訳（メタデータ） (2026-05-21T15:50:40Z)
Reading Calibrated Uncertainty from Language Model Trajectories [46.663987199083245]
モデルの内部アクティベーションを調査する手法は、生の隠れた状態を不透明なスナップショットにフィードし、表現が形成される層回りの軌跡を暗黙的に残す。我々は11のスケール不変な幾何学的特徴を抽出し、層ごとの更新の累積経路をトレースし、それらをスパース線形プローブに供給する。このプローブは、最大21のAURCポイントでベースラインスケーリングを行い、選択的な棄権下でMPPより優れる。
論文参考訳（メタデータ） (2026-05-19T19:24:29Z)
A Differentiable Bayesian Relaxation for Latent Partial-Order Inference [2.124421328820064]
多くのランク付けおよびエージェントトレースデータセットは、その潜在構造が部分的に順序づけられているにもかかわらず、線形順序として記録される。このようなトレースから潜在部分順序推論を微分可能緩和する。我々は,ソフトトランジシティ,シャープリミットフロンティア回復,硬度への収束を証明した。
論文参考訳（メタデータ） (2026-05-07T21:47:41Z)
On the continuity of flows [0.10152838128195464]
本研究では, 流れマッチング対象の最適速度場が空間的不連続性を示すことを示す。この不連続性は、連続フローが単一モードを複数のモードにマップするために分岐しなければならないという要求から生じる。解析の結果,この現象は損失$L2$ではなく,分布間の位相的ミスマッチの結果である可能性が示唆された。
論文参考訳（メタデータ） (2025-12-14T20:00:39Z)
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
乱摂動の分布は, 摂動段差がゼロになる傾向にあるため, 推定子の分散を最小限に抑える。以上の結果から, 一定の長さを維持するのではなく, 真の勾配に方向を合わせることが可能であることが示唆された。
論文参考訳（メタデータ） (2025-10-22T19:06:39Z)
Flow based approach for Dynamic Temporal Causal models with non-Gaussian or Heteroscedastic Noises [37.02662517645979]
因果発見のための統合フレームワークであるFANTOMを紹介する。非定常過程と非ガウス的および異方性雑音を扱う。同時にレジームの数と対応するインデックスを推測し、各レジームのディレクテッド・アサイクリックグラフを学習する。
論文参考訳（メタデータ） (2025-06-20T15:12:43Z)
Discrete-to-Continuum Approach for the Analytic Continuation of One-Particle Propagator on the Circle [0.0]
数値評価に適した円上の自由離散プロパゲータに対して有限表現を導出する。これらの表現は、連続円極限におけるプロパゲータの再構成を可能にする。この方法では、よく知られた無限直線極限が一貫して回復されることが示される。
論文参考訳（メタデータ） (2025-03-18T18:01:48Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent(GD)は、現代の機械学習の強力な仕事場である。 GDが局所最小値を見つける能力は、リプシッツ勾配の損失に対してのみ保証される。この研究は、2段階の勾配更新の分析を通じて、単純だが代表的でありながら、学習上の問題に焦点をあてる。
論文参考訳（メタデータ） (2022-06-08T21:32:50Z)
Lifting the Convex Conjugate in Lagrangian Relaxations: A Tractable Approach for Continuous Markov Random Fields [53.31927549039624]
断片的な離散化は既存の離散化問題と矛盾しないことを示す。この理論を2つの画像のマッチング問題に適用する。
論文参考訳（メタデータ） (2021-07-13T12:31:06Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。