Fugu-MT 論文翻訳(概要): REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

論文の概要: REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

arxiv url: http://arxiv.org/abs/2509.22518v1
Date: Fri, 26 Sep 2025 16:02:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.566872
Title: REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Title（参考訳）: REMA: 大規模言語モデルの解釈のための統一型推論マニフォールドフレームワーク
Authors: Bo Li, Guanzhi Deng, Ronghao Chen, Junrong Yue, Shuo Zhang, Qinghua Zhao, Linqi Song, Lijie Wen,
Abstract要約: 推論多様体(Reasoning Manifold)は、すべての正しく推論された世代に対応する内部表現によって形成される潜在低次元幾何学構造である。誤りと正しい推論サンプルに対応する内部モデル表現の空間的関係を定量的に比較することにより,障害の起源を説明するフレームワークであるREMAを構築した。多様な言語およびマルチモーダルモデルおよびタスクに関する実験は、推論多様体の低次元の性質と誤った推論表現と正しい推論表現の間の高い分離性を示す。
参考スコア（独自算出の注目度）: 29.40036398095681
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Understanding how Large Language Models (LLMs) perform complex reasoning and their failure mechanisms is a challenge in interpretability research. To provide a measurable geometric analysis perspective, we define the concept of the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations. This structure can be conceptualized as the embodiment of the effective thinking paths that the model has learned to successfully solve a given task. Based on this concept, we build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples. Specifically, REMA first quantifies the geometric deviation of each erroneous representation by calculating its k-nearest neighbors distance to the approximated manifold formed by correct representations, thereby providing a unified failure signal. It then localizes the divergence points where these deviations first become significant by tracking this deviation metric across the model's layers and comparing it against a baseline of internal fluctuations from correct representations, thus identifying where the reasoning chain begins to go off-track. Our extensive experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations. The results also validate the effectiveness of the REMA framework in analyzing the origins of reasoning failures. This research connects abstract reasoning failures to measurable geometric deviations in representations, providing new avenues for in-depth understanding and diagnosis of the internal computational processes of black-box models.
Abstract（参考訳）: 大言語モデル(LLM)がどのように複雑な推論を行い、その失敗メカニズムを理解することは、解釈可能性研究における課題である。測定可能な幾何解析の観点で、すべての正しく推論された世代に対応する内部表現によって形成された潜在低次元幾何学的構造であるReasoning Manifoldの概念を定義する。この構造は、モデルが与えられたタスクをうまく解くために学んだ効果的な思考経路の具体化として概念化することができる。この概念に基づいて、誤推論と正しい推論の両方に対応する内部モデル表現の空間的関係を定量的に比較することにより、障害の起源を説明するフレームワークであるREMAを構築した。特に、REMAは、k-アネレスト近傍距離を正しい表現で形成された近似多様体に計算することで、各誤表現の幾何偏差を第一に定量化し、統一された故障信号を与える。次に、これらの偏差が最初に重要になる分岐点を、モデルの層をまたいでこの偏差距離を追跡し、正しい表現から内部のゆらぎのベースラインと比較することで局所化し、従って、推論連鎖が軌道から外れ始める場所を特定する。多様な言語およびマルチモーダルモデルおよびタスクに関する広範な実験は、推論多様体の低次元の性質と誤った推論表現と正しい推論表現の間の高い分離性を示す。その結果,REMAフレームワークが推論失敗の原因を分析する上での有効性についても検証した。本研究は,抽象的推論失敗を表現の幾何学的偏差の測定に結び付け,ブラックボックスモデルの内部計算過程の深い理解と診断のための新たな道を提供する。

論文の概要: REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

関連論文リスト