Fugu-MT 論文翻訳(概要): Negative Pre-activations Differentiate Syntax

論文の概要: Negative Pre-activations Differentiate Syntax

arxiv url: http://arxiv.org/abs/2509.24198v1
Date: Mon, 29 Sep 2025 02:29:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.694467
Title: Negative Pre-activations Differentiate Syntax
Title（参考訳）: 負のプレアクティベーション差分構文
Authors: Linghao Kong, Angelina Ning, Micah Adler, Nir Shavit,
Abstract要約: ワッサースタインニューロンとして知られる最近発見された絡み合ったニューロンのクラスは、大きな言語モデルでは不均等に重要である。絡み合ったニューロンのスパース部分集合における負の分化は、言語モデルが構文に依存する重要なメカニズムであることを示す。
参考スコア（独自算出の注目度）: 3.623168857780243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A recently discovered class of entangled neurons, known as Wasserstein neurons, is disproportionately critical in large language models despite constituting only a very small fraction of the network: their targeted removal collapses the model, consistent with their unique role in differentiating similar inputs. Interestingly, in Wasserstein neurons immediately preceding smooth activation functions, such differentiation manifests in the negative pre-activation space, especially in early layers. Pairs of similar inputs are driven to highly distinct negative values, and these pairs involve syntactic tokens such as determiners and prepositions. We show that this negative region is functional rather than simply favorable for optimization. A minimal, sign-specific intervention that zeroes only the negative pre-activations of a small subset of entangled neurons significantly weakens overall model function and disrupts grammatical behavior, while both random and perplexity-matched controls leave grammatical performance largely unchanged. Part of speech analysis localizes the excess surprisal to syntactic scaffolding tokens, and layer-specific interventions reveal that small local degradations accumulate across depth. Over training checkpoints, the same ablation impairs grammatical behavior as Wasserstein neurons emerge and stabilize. Together, these results identify negative differentiation in a sparse subset of entangled neurons as a crucial mechanism that language models rely on for syntax.
Abstract（参考訳）: ワッサースタインニューロンとして知られる最近発見された絡み合ったニューロンのクラスは、ネットワークのごく一部を構成するにもかかわらず、大きな言語モデルでは不均等に重要なものである。興味深いことに、ワッサースタインニューロンは、スムーズな活性化関数に先行して、特に初期の層において、負のプレアクティベーション空間にそのような分化が現れる。類似した入力のペアは、非常に異なる負の値に駆動され、これらのペアは、決定子や前置詞のような構文トークンを含む。この負の領域は単に最適化に有利ではなく機能的であることを示す。絡み合ったニューロンの小さなサブセットの負のプレアクティベーションのみをゼロにする最小限のシグナル特異的介入は、全体的なモデル機能を大幅に弱め、文法的振る舞いを阻害する。音声分析の一部は、構文的スキャフォールディングトークンの過剰な副産物を局所化し、層特異的な介入により、小さな局所的な劣化が深さにわたって蓄積されることが分かる。トレーニングチェックポイントの間、同じアブレーションはワッサースタインニューロンが出現し安定するのと同じ文法的振舞いを損なう。これらの結果は、言語モデルが文法に頼っている重要なメカニズムとして、絡み合ったニューロンのスパースサブセットにおける負の分化を識別する。

論文の概要: Negative Pre-activations Differentiate Syntax

関連論文リスト