Fugu-MT 論文翻訳(概要): Superposition disentanglement of neural representations reveals hidden alignment

論文の概要: Superposition disentanglement of neural representations reveals hidden alignment

arxiv url: http://arxiv.org/abs/2510.03186v1
Date: Fri, 03 Oct 2025 17:12:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.503963
Title: Superposition disentanglement of neural representations reveals hidden alignment
Title（参考訳）: 隠れたアライメントを呈する神経表現の重畳解離
Authors: André Longon, David Klindt, Meenakshi Khosla,
Abstract要約: 神経科学とAIにおいて、表現アライメントメトリクスは、異なるディープニューラルネットワーク(DNN)または脳が類似した情報を表現する範囲を測定する。我々は、厳密な置換度が重ね合わせに依存するかの理論を開発する。この結果から,ニューラルネットワーク間の真の表現的アライメントを明らかにするためには,重畳不整合(superposition disentanglement)が必要であることが示唆された。
参考スコア（独自算出の注目度）: 6.015414975356222
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the number of neurons. In neuroscience and AI, representational alignment metrics measure the extent to which different deep neural networks (DNNs) or brains represent similar information. In this work, we explore a critical question: \textit{does superposition interact with alignment metrics in any undesirable way?} We hypothesize that models which represent the same features in \textit{different superposition arrangements}, i.e., their neurons have different linear combinations of the features, will interfere with predictive mapping metrics (semi-matching, soft-matching, linear regression), producing lower alignment than expected. We first develop a theory for how the strict permutation metrics are dependent on superposition arrangements. This is tested by training sparse autoencoders (SAEs) to disentangle superposition in toy models, where alignment scores are shown to typically increase when a model's base neurons are replaced with its sparse overcomplete latent codes. We find similar increases for DNN\(\rightarrow\)DNN and DNN\(\rightarrow\)brain linear regression alignment in the visual domain. Our results suggest that superposition disentanglement is necessary for mapping metrics to uncover the true representational alignment between neural codes.
Abstract（参考訳）: 重ね合わせ仮説は、集団内の1つのニューロンが、個体群がニューロンの数よりも多くの特徴を表現するために、複数の特徴の表現に参加する可能性があることを述べている。神経科学とAIにおいて、表現アライメントメトリクスは、異なるディープニューラルネットワーク(DNN)または脳が類似した情報を表現する範囲を測定する。 textit{does superpositionは、任意の望ましくない方法でアライメントメトリクスと相互作用しますか? それらのニューロンは特徴の異なる線形結合を持ち、予測マッピングの指標(セミマッチング、ソフトマッチング、リニア回帰)に干渉し、予想よりも低いアライメントを生み出す。まず、厳密な置換測度が重畳配置に依存するかの理論を考案する。これはスパースオートエンコーダ(SAEs)を訓練して、おもちゃのモデルの重ね合わせを解体し、モデルのベースニューロンをスパースオーバーコンプリートコードに置き換えると、アライメントスコアが通常増加することを示す。視覚領域におけるDNN\(\rightarrow\)DNNとDNN\(\rightarrow\)Brainの線形回帰アライメントについても同様の増加が見られる。この結果から,ニューラルネットワーク間の真の表現的アライメントを明らかにするためには,重畳不整合(superposition disentanglement)が必要であることが示唆された。

論文の概要: Superposition disentanglement of neural representations reveals hidden alignment

関連論文リスト