Fugu-MT 論文翻訳(概要): Disentangling concept semantics via multilingual averaging in Sparse Autoencoders

論文の概要: Disentangling concept semantics via multilingual averaging in Sparse Autoencoders

arxiv url: http://arxiv.org/abs/2508.14275v1
Date: Tue, 19 Aug 2025 21:18:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-21 16:52:41.273671
Title: Disentangling concept semantics via multilingual averaging in Sparse Autoencoders
Title（参考訳）: スパースオートエンコーダにおける多言語平均化によるディエンタングリング概念意味論
Authors: Cliff O'Reilly, Ernesto Jimenez-Ruiz, Tillman Weyde,
Abstract要約: 本稿では,スパースオートエンコーダを用いた概念アクティベーションの平均化により,Large Langue Modelsの概念セマンティクスを分離する手法を提案する。 Sparse Autoencoders のオープンソース Gemma Scope スイートを用いて,各クラスおよび言語バージョンに対する概念アクティベーションを得る。以上の結果から,概念平均は単一言語自体と比較してクラス間の真の関係に一致することが示唆された。
参考スコア（独自算出の注目度）: 3.1542695050861544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Connecting LLMs with formal knowledge representation and reasoning is a promising approach to address their shortcomings. Embeddings and sparse autoencoders are widely used to represent textual content, but the semantics are entangled with syntactic and language-specific information. We propose a method that isolates concept semantics in Large Langue Models by averaging concept activations derived via Sparse Autoencoders. We create English text representations from OWL ontology classes, translate the English into French and Chinese and then pass these texts as prompts to the Gemma 2B LLM. Using the open source Gemma Scope suite of Sparse Autoencoders, we obtain concept activations for each class and language version. We average the different language activations to derive a conceptual average. We then correlate the conceptual averages with a ground truth mapping between ontology classes. Our results give a strong indication that the conceptual average aligns to the true relationship between classes when compared with a single language by itself. The result hints at a new technique which enables mechanistic interpretation of internal network states with higher accuracy.
Abstract（参考訳）: LLMを形式的な知識表現と推論で結びつけることは、その欠点に対処するための有望なアプローチである。埋め込みやスパースオートエンコーダはテキストコンテンツを表現するために広く使われているが、セマンティクスは構文情報や言語固有の情報と絡み合っている。本稿では,スパースオートエンコーダを用いた概念アクティベーションの平均化により,Large Langue Modelsの概念セマンティクスを分離する手法を提案する。 OWLオントロジークラスから英語のテキスト表現を作成し、英語をフランス語と中国語に翻訳し、これらのテキストをGemma 2B LLMへのプロンプトとして渡す。 Sparse Autoencoders のオープンソース Gemma Scope スイートを用いて,各クラスおよび言語バージョンに対する概念アクティベーションを得る。異なる言語のアクティベーションを平均化し、概念的な平均を導き出す。次に、概念平均をオントロジークラス間の基底真理写像と相関付ける。以上の結果から,概念平均は単一言語自体と比較してクラス間の真の関係に一致することが示唆された。その結果,より高精度な内部ネットワーク状態の機械的解釈を可能にする新しい手法が示唆された。

論文の概要: Disentangling concept semantics via multilingual averaging in Sparse Autoencoders

関連論文リスト