Fugu-MT 論文翻訳(概要): An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

論文の概要: An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

arxiv url: http://arxiv.org/abs/2606.17522v1
Date: Tue, 16 Jun 2026 05:02:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.27739
Title: An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars
Title（参考訳）: 境界深度文法を用いた深部変圧器の階層的モデリングの表現性解析
Authors: Vinoth Nandakumar, Qiang Qu, Pramod Thebe, Sakshi Khachariya, Tongliang Liu,
Abstract要約: ディープニューラルネットワークは、その表現力は、テクスブ階層的表現を形成する能力から導かれると広く信じられている。言語モデリングでは、textbftransformerが支配的なアーキテクチャとして登場し、初期のレイヤはローカルの構文パターンをキャプチャし、後のレイヤはより複雑な節レベルの依存関係をコードしている。これらのアーキテクチャは、抽象文法状態が残留ストリーム内の低次元の線形分離可能部分空間に符号化される構造的能力を持っていると論じる。
参考スコア（独自算出の注目度）: 54.11540943172608
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks are widely believed to derive their expressive power from their ability to form \textbf{hierarchical representations}, capturing progressively more abstract and compositional features across layers. In language modeling, \textbf{transformers} have emerged as the dominant architecture, with early layers capturing local syntactic patterns and later layers encoding more complex clause-level dependencies. While this intuition has shaped model design, there remains a lack of rigorous theoretical work demonstrating \textbf{how} deep transformers represent such hierarchical structures. In this work, we analyze the expressiveness of deep transformer models through the formal lens of bounded-depth, non-recursive context-free grammars. For this class of grammars, we explicitly construct transformers with positional attention whose depth grows linearly with grammar depth, while the neuron count scales with the number of derivation-tree shapes and quadratically with the number of production rules. Our theoretical results support the linear representation hypothesis by demonstrating that these architectures possess the structural capacity to encode abstract grammatical states into low-dimensional, linearly separable subspaces within the residual stream.
Abstract（参考訳）: ディープニューラルネットワークは、その表現力は、層をまたいだより抽象的で構成的な特徴を徐々に捉えて、‘textbf{hierarchical representations’を形成する能力から導かれると広く信じられている。言語モデリングにおいて、 textbf{transformers} が支配的なアーキテクチャとして登場し、初期のレイヤは局所的な構文パターンをキャプチャし、後のレイヤはより複雑な節レベルの依存関係をコードしている。この直観はモデル設計を形作るが、'textbf{how} ディープ・トランスフォーマーがそのような階層構造を表すことを示す厳密な理論的な研究は残っていない。本研究では,境界深度,非再帰的文脈自由文法の形式レンズを用いて,深部変圧器モデルの表現性を解析する。この文法のクラスでは、深さが文法の深さと線形に増加する位置対応の変換器を明示的に構築する一方、ニューロンカウントは派生木の形状の数でスケールし、生成規則の数で2次的にスケールする。我々の理論的結果は、これらのアーキテクチャが、抽象文法状態が残ストリーム内の低次元で線形分離可能な部分空間にエンコードする構造的能力を持っていることを示すことによって、線形表現仮説を支持する。

論文の概要: An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

関連論文リスト