Fugu-MT 論文翻訳(概要): Characterizing Intrinsic Compositionality In Transformers With Tree Projections

論文の概要: Characterizing Intrinsic Compositionality In Transformers With Tree Projections

arxiv url: http://arxiv.org/abs/2211.01288v1
Date: Wed, 2 Nov 2022 17:10:07 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-03 13:20:53.859229
Title: Characterizing Intrinsic Compositionality In Transformers With Tree Projections
Title（参考訳）: 木突起を有する変圧器の固有の構成性
Authors: Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning
Abstract要約: トランスのようなニューラルモデルは、入力の異なる部分間で情報を任意にルーティングすることができる。 3つの異なるタスクに対するトランスフォーマーは、トレーニングの過程でより木のようなものになることを示す。これらの木はモデル挙動を予測し、より木のようなモデルは構成的一般化のテストにおいてより良く一般化する。
参考スコア（独自算出の注目度）: 72.45375959893218
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When trained on language data, do transformers learn some arbitrary computation that utilizes the full capacity of the architecture or do they learn a simpler, tree-like computation, hypothesized to underlie compositional meaning systems like human languages? There is an apparent tension between compositional accounts of human language understanding, which are based on a restricted bottom-up computational process, and the enormous success of neural models like transformers, which can route information arbitrarily between different parts of their input. One possibility is that these models, while extremely flexible in principle, in practice learn to interpret language hierarchically, ultimately building sentence representations close to those predictable by a bottom-up, tree-structured model. To evaluate this possibility, we describe an unsupervised and parameter-free method to \emph{functionally project} the behavior of any transformer into the space of tree-structured networks. Given an input sentence, we produce a binary tree that approximates the transformer's representation-building process and a score that captures how "tree-like" the transformer's behavior is on the input. While calculation of this score does not require training any additional models, it provably upper-bounds the fit between a transformer and any tree-structured approximation. Using this method, we show that transformers for three different tasks become more tree-like over the course of training, in some cases unsupervisedly recovering the same trees as supervised parsers. These trees, in turn, are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
Abstract（参考訳）: 言語データに基づいてトレーニングされた場合、トランスフォーマーはアーキテクチャの完全な能力を利用する任意の計算を学習するか、あるいは、人間の言語のような構成的意味システムに根ざした単純な木のような計算を学習するのか? 制限されたボトムアップ計算プロセスに基づく人間の言語理解の構成的説明と、入力の異なる部分間で情報を任意にルーティングできるトランスフォーマーのようなニューラルモデルの巨大な成功との間には、明らかな緊張関係がある。一つの可能性は、これらのモデルは原則としては極めて柔軟であるが、実際には言語を階層的に解釈することを学び、最終的にボトムアップのツリー構造モデルによって予測可能なものに近い文表現を構築する。この可能性を評価するために,木構造ネットワークの空間に任意のトランスフォーマーの振る舞いを投影する,教師なしかつパラメータフリーな手法について述べる。入力文が与えられた場合、変換器の表現構築過程を近似する二分木と、変換器の動作が入力上で「ツリー様」であることを示すスコアを生成する。このスコアの計算には追加のモデルのトレーニングは必要ないが、変圧器と木構造近似の間の適合性は確実に上界である。この方法を用いて,3つの異なるタスクに対するトランスフォーマーが,教師なしのパーサーと同じ木を無監督で復元するなど,学習の過程でツリーライクになることを示す。これらの木はモデル挙動を予測し、より木のようなモデルが合成一般化のテストでより良く一般化される。

論文の概要: Characterizing Intrinsic Compositionality In Transformers With Tree Projections

関連論文リスト