Fugu-MT 論文翻訳(概要): ActivationReasoning: Logical Reasoning in Latent Activation Spaces

論文の概要: ActivationReasoning: Logical Reasoning in Latent Activation Spaces

arxiv url: http://arxiv.org/abs/2510.18184v1
Date: Tue, 21 Oct 2025 00:21:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.736958
Title: ActivationReasoning: Logical Reasoning in Latent Activation Spaces
Title（参考訳）: 活性化推論:潜在活性化空間における論理的推論
Authors: Lukas Helff, Ruben Härle, Wolfgang Stammer, Felix Friedrich, Manuel Brack, Antonia Wüst, Hikaru Shindo, Patrick Schramowski, Kristian Kersting,
Abstract要約: 大きな言語モデル (LLM) は、流動的なテキストを生成するのに優れているが、内部の推論は不透明で制御が難しいままである。 LLMの潜在空間に明示的な論理的推論を組み込むフレームワークである ActivationReasoning (AR) を導入する。 ARは推論の複雑さで堅牢にスケールし、抽象的でコンテキストに敏感なタスクに一般化し、モデルバックボーン間で転送する。
参考スコア（独自算出の注目度）: 43.17973499652433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) excel at generating fluent text, but their internal reasoning remains opaque and difficult to control. Sparse autoencoders (SAEs) make hidden activations more interpretable by exposing latent features that often align with human concepts. Yet, these features are fragile and passive, offering no mechanism for systematic reasoning or model control. To address this, we introduce ActivationReasoning (AR), a framework that embeds explicit logical reasoning into the latent space of LLMs. It proceeds in three stages: (1) Finding latent representations, first latent concept representations are identified (e.g., via SAEs) and organized into a dictionary; (2) Activating propositions, at inference time AR detects activating concepts and maps them to logical propositions; and (3)Logical reasoning, applying logical rules over these propositions to infer higher-order structures, compose new concepts, and steer model behavior. We evaluate AR on multi-hop reasoning (PrOntoQA), abstraction and robustness to indirect concept cues (Rail2Country), reasoning over natural and diverse language (ProverQA), and context-sensitive safety (BeaverTails). Across all tasks, AR scales robustly with reasoning complexity, generalizes to abstract and context-sensitive tasks, and transfers across model backbones. These results demonstrate that grounding logical structure in latent activations not only improves transparency but also enables structured reasoning, reliable control, and alignment with desired behaviors, providing a path toward more reliable and auditable AI.
Abstract（参考訳）: 大きな言語モデル (LLM) は、流動的なテキストを生成するのに優れているが、内部の推論は不透明で制御が難しいままである。スパースオートエンコーダ(SAE)は、人間の概念によく一致する潜伏した特徴を明らかにすることで、隠れたアクティベーションをより解釈可能にする。しかし、これらの機能は脆弱で受動的であり、体系的な推論やモデル制御のメカニズムを提供しない。これに対処するために、私たちは、LCMの潜在空間に明示的な論理的推論を組み込むフレームワークであるActivationReasoning(AR)を紹介します。 1)潜伏表現の発見、第1潜伏概念表現の特定(例えば、SAEを通して)、辞書化、(2)推論時に命題の活性化、ARが概念の活性化を検出し、それらを論理的命題にマッピングする、(3)論理的推論、これらの命題に論理的ルールを適用して高階構造を推論し、新しい概念を作成し、新しいモデル行動を作成する、などである。マルチホップ推論(PrOntoQA)、間接コンセプトキュー(Rail2Country)への抽象化とロバスト性、自然言語および多言語(ProverQA)への推論(ProverQA)、コンテキスト依存安全性(BeaverTails)について、ARを評価する。すべてのタスクにわたって、ARは推論の複雑さと共に堅牢にスケールし、抽象的でコンテキストに敏感なタスクに一般化し、モデルバックボーン間で転送する。これらの結果は、潜伏活性化における論理構造の基礎は透明性を改善するだけでなく、構造化された推論、信頼性の高い制御、望ましい行動との整合を可能にし、より信頼性が高く監査可能なAIへの道筋を提供することを示している。

論文の概要: ActivationReasoning: Logical Reasoning in Latent Activation Spaces

関連論文リスト