Fugu-MT 論文翻訳(概要): The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

論文の概要: The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

arxiv url: http://arxiv.org/abs/2604.02178v1
Date: Thu, 02 Apr 2026 15:41:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.892136
Title: The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level
Title（参考訳）: 専門家が振り返る:エキスパートレベルでの混在する言語モデルの解釈
Authors: Jeremy Herbst, Jae Hee Lee, Stefan Wermter,
Abstract要約: Mixture-of-Experts (MoE) が大規模言語モデル(LLM)のスケーリングの主要な選択肢となっている。我々は、$k$sparse Probingを用いて、MoEの専門家と高密度フィードフォワードネットワークを比較した。専門家ニューロンは、ルーティングがスペーサーになるにつれてギャップが広くなるため、連続的にポリセマンティックではないことが分かりました。
参考スコア（独自算出の注目度）: 9.716523835964045
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixture-of-Experts (MoE) architectures have become the dominant choice for scaling Large Language Models (LLMs), activating only a subset of parameters per token. While MoE architectures are primarily adopted for computational efficiency, it remains an open question whether their sparsity makes them inherently easier to interpret than dense feed-forward networks (FFNs). We compare MoE experts and dense FFNs using $k$-sparse probing and find that expert neurons are consistently less polysemantic, with the gap widening as routing becomes sparser. This suggests that sparsity pressures both individual neurons and entire experts toward monosemanticity. Leveraging this finding, we zoom out from the neuron to the expert level as a more effective unit of analysis. We validate this approach by automatically interpreting hundreds of experts. This analysis allows us to resolve the debate on specialization: experts are neither broad domain specialists (e.g., biology) nor simple token-level processors. Instead, they function as fine-grained task experts, specializing in linguistic operations or semantic tasks (e.g., closing brackets in LaTeX). Our findings suggest that MoEs are inherently interpretable at the expert level, providing a clearer path toward large-scale model interpretability. Code is available at: https://github.com/jerryy33/MoE_analysis
Abstract（参考訳）: Mixture-of-Experts (MoE)アーキテクチャは、Large Language Models (LLM)のスケーリングにおいて主要な選択肢となり、トークンごとにパラメータのサブセットだけを活性化している。 MoEアーキテクチャは、主に計算効率のために採用されているが、その疎さが、高密度フィードフォワードネットワーク(FFN)よりも本質的に容易に解釈できるかどうかについては、未解決のままである。我々は、$k$-sparse Probingを用いてMoEの専門家と高密度FFNを比較し、専門家ニューロンは、ルーティングがスペーサーになるにつれてギャップが広がるため、一貫してポリセマンティックでないことを発見した。これは、スパーシティが個々のニューロンと専門家の双方をモノセマンティズムに圧力をかけることを示唆している。この発見を活用すれば、より効果的な分析単位として、ニューロンから専門家レベルへのズームアウトが可能になります。何百人ものエキスパートを自動的に解釈することで、このアプローチを検証する。専門家は広い領域の専門家(例えば生物学)でも単純なトークンレベルのプロセッサでもない。代わりに、言語操作や意味タスク(LaTeXで括弧を閉じるなど)に特化した、きめ細かいタスクエキスパートとして機能する。以上の結果から,MoEは本質的に専門家レベルで解釈可能であることが示唆された。コードは、https://github.com/jerryy33/MoE_analysisで入手できる。

論文の概要: The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

関連論文リスト