Fugu-MT 論文翻訳(概要): Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models

論文の概要: Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models

arxiv url: http://arxiv.org/abs/2505.18244v1
Date: Fri, 23 May 2025 16:55:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 16:58:42.299936
Title: Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models
Title（参考訳）: マルチスケール確率的生成理論:大規模言語モデルを解釈するための階層的枠組み
Authors: Yukin Zhang, Qi Dong,
Abstract要約: 大規模なTransformerベースの言語モデルは、優れたパフォーマンスを達成するが、テキストの計画、構造、実現には不透明である。階層的なフレームワークであるMulti_Scale Probabilistic Generation Theory (MSPGT)を導入し、生成を3つの意味尺度_globalコンテキスト、中間構造、局所的な単語選択に分解する。
参考スコア（独自算出の注目度）: 1.2027959564488593
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Transformer based language models achieve remarkable performance but remain opaque in how they plan, structure, and realize text. We introduce Multi_Scale Probabilistic Generation Theory (MSPGT), a hierarchical framework that factorizes generation into three semantic scales_global context, intermediate structure, and local word choices and aligns each scale with specific layer ranges in Transformer architectures. To identify scale boundaries, we propose two complementary metrics: attention span thresholds and inter layer mutual information peaks. Across four representative models (GPT-2, BERT, RoBERTa, and T5), these metrics yield stable local/intermediate/global partitions, corroborated by probing tasks and causal interventions. We find that decoder_only models allocate more layers to intermediate and global processing while encoder_only models emphasize local feature extraction. Through targeted interventions, we demonstrate that local scale manipulations primarily influence lexical diversity, intermediate-scale modifications affect sentence structure and length, and global_scale perturbations impact discourse coherence all with statistically significant effects. MSPGT thus offers a unified, architecture-agnostic method for interpreting, diagnosing, and controlling large language models, bridging the gap between mechanistic interpretability and emergent capabilities.
Abstract（参考訳）: 大規模なTransformerベースの言語モデルは、優れたパフォーマンスを達成するが、テキストの計画、構造、実現には不透明である。階層的なフレームワークであるMulti_Scale Probabilistic Generation Theory (MSPGT)を導入し、生成を3つの意味尺度_globalコンテキスト、中間構造、局所単語選択に分解し、各スケールをトランスフォーマーアーキテクチャの特定の層範囲と整合させる。スケール境界を識別するために、注意範囲閾値と層間情報ピークの2つの相補的指標を提案する。 4つの代表的なモデル(GPT-2、BERT、RoBERTa、T5)にまたがって、これらのメトリクスは安定した局所的/中間的/グローバルなパーティションを生み出し、タスクの探索と因果的介入によって相関する。 decoder_onlyモデルは、中間処理とグローバル処理により多くのレイヤを割り当てているのに対し、encoder_onlyモデルは局所的な特徴抽出を強調する。目的とする介入を通じて、局所的なスケール操作が語彙の多様性に主に影響を与え、中間的なスケール修正が文の構造と長さに影響を与え、グローバルスケールの摂動が言論コヒーレンスに統計的に有意な影響をもたらすことを示した。 MSPGTは、機械的解釈可能性と創発的能力のギャップを埋め、大きな言語モデルを解釈、診断、制御するための統一されたアーキテクチャに依存しない方法を提供する。

関連論文リスト

SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection [4.930667479611019]
本稿では,マルチモーダルモデルにおける空間的判断 -コーディネート検出による統一を目指して- マルチモーダル空間における空間推論を通した視覚言語モデルとのセグメンテーション手法の統合手法を提案する。ベンチマークデータセット間で優れたパフォーマンスを示し、COCO 2017では0.5958、Pascal VOCでは0.6758、IoUスコアを達成しました。
論文参考訳（メタデータ） (2024-12-03T16:53:58Z)
Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP)は、大規模言語モデルが言語構造をどのように処理するかを分析するために設計された方法論である。 CAPは様々なモデルレベルで構成型プールを通してモデル活性化に介入する。本研究は,合成セマンティクス処理とモデル解釈可能性に関する,現在のトランスフォーマーアーキテクチャの基本的制約を明らかにする。
論文参考訳（メタデータ） (2024-10-16T18:10:50Z)
One-for-All: Towards Universal Domain Translation with a Single StyleGAN [86.33216867136639]
視覚的に異なる領域間の表現を変換するための新しい翻訳モデルUniTranslatorを提案する。提案したUniTranslatorは汎用的で、スタイルミキシング、スタイリゼーション、翻訳など様々なタスクを実行できる。 UniTranslatorは、既存の汎用モデルの性能を超越し、代表タスクの特殊モデルに対してよく機能する。
論文参考訳（メタデータ） (2023-10-22T08:02:55Z)
Investigating semantic subspaces of Transformer sentence embeddings through linear structural probing [2.5002227227256864]
本研究では,文レベル表現の研究手法である意味構造探索を用いた実験を行う。本手法は,2つのタスクの文脈において,異なる言語モデル(エンコーダのみ,デコーダのみ,エンコーダのみ,エンコーダ-デコーダ)と異なる大きさの言語モデルに適用する。モデルファミリは、その性能と層動力学において大きく異なるが、結果は大半がモデルサイズの不変量である。
論文参考訳（メタデータ） (2023-10-18T12:32:07Z)
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation [53.04781510348416]
フレーム内精度とフレーム間スムーズさにより,映像に基づく3次元人間のポーズと形状推定を評価する。エンドツーエンドフレームワークGLoT(Global-to-Local Transformer)における長期的・短期的相関のモデル化を構造的に分離することを提案する。我々のGLoTは、一般的なベンチマーク(3DPW、MPI-INF-3DHP、Human3.6M)において、最も低いモデルパラメータを持つ従来の最先端の手法を上回る。
論文参考訳（メタデータ） (2023-03-26T14:57:49Z)
Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
本稿では,時間文グラウンドリングのマルチメディア問題について検討する。与えられた文問合せに従って、トリミングされていないビデオ内の特定のビデオセグメントを正確に決定することを目的としている。
論文参考訳（メタデータ） (2022-08-31T14:16:56Z)
Examining Scaling and Transfer of Language Model Architectures for Machine Translation [51.69212730675345]
言語モデル(LM)は単一のレイヤのスタックで処理し、エンコーダ・デコーダモデル(EncDec)は入力と出力の処理に別々のレイヤスタックを使用する。機械翻訳において、EncDecは長年好まれてきたアプローチであるが、LMの性能についての研究はほとんどない。
論文参考訳（メタデータ） (2022-02-01T16:20:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。