Fugu-MT 論文翻訳(概要): Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

論文の概要: Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

arxiv url: http://arxiv.org/abs/2109.15256v1
Date: Thu, 30 Sep 2021 16:41:19 GMT
ステータス: 翻訳完了
システム内更新日: 2021-10-01 15:10:06.893546
Title: Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks
Title（参考訳）: 補助シーケンス予測課題によるトランスフォーマの合成汎化能力の誘導
Authors: Yichen Jiang, Mohit Bansal
Abstract要約: 体系的な構成性は人間の言語において必須のメカニズムであり、既知の部品の組換えによって新しい表現を作り出すことができる。既存のニューラルモデルには、記号構造を学習する基本的な能力がないことが示されている。本稿では,関数の進行と引数のセマンティクスを追跡する2つの補助シーケンス予測タスクを提案する。
参考スコア（独自算出の注目度）: 86.10875837475783
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. However, existing neural models have been shown to lack this basic ability in learning symbolic structures. Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision. These automatically-generated sequences are more representative of the underlying compositional symbolic structures of the input data. During inference, the model jointly predicts the next action and the next tokens in the auxiliary sequences at each step. Experiments on the SCAN dataset show that our method encourages the Transformer to understand compositional structures of the command, improving its accuracy on multiple challenging splits from <= 10% to 100%. With only 418 (5%) training instances, our approach still achieves 97.8% accuracy on the MCD1 split. Therefore, we argue that compositionality can be induced in Transformers given minimal but proper guidance. We also show that a better result is achieved using less contextualized vectors as the attention's query, providing insights into architecture choices in achieving systematic compositionality. Finally, we show positive generalization results on the groundedSCAN task (Ruis et al., 2020). Our code is publicly available at: https://github.com/jiangycTarheel/compositional-auxseq
Abstract（参考訳）: 体系的な構成性は人間の言語において必須のメカニズムであり、既知の部品の組換えによって新しい表現を作り出すことができる。しかし、既存のニューラルモデルには、記号構造を学習する基本的な能力がないことが示されている。コマンドをアクションにパースする必要があるscan compositionality challenge (lake and baroni, 2018) におけるトランスフォーマティブモデルの失敗に動機づけられ、追加のトレーニング監督として、関数と引数の意味論の進行を追跡する2つの補助シーケンス予測タスクを提案する。これらの自動生成シーケンスは、入力データの構成要素的シンボリック構造のより代表的である。推論中、モデルは各ステップの補助シーケンスにおける次のアクションと次のトークンを共同で予測する。 SCANデータセットの実験では、我々の手法はトランスフォーマーがコマンドの構成構造を理解し、その精度を<=10%から100%に向上させることが示されている。トレーニングインスタンスは418(5%)に過ぎませんが、MCD1スプリットの精度は97.8%です。したがって、最小でも適切なガイダンスを与えるトランスフォーマーでは、構成性が引き起こされる。また、より少ない文脈化されたベクトルを注目のクエリとして利用し、体系的な構成性を達成するためのアーキテクチャ選択に関する洞察を提供する。最後に, groundedscan task (ruis et al., 2020) において正の一般化結果を示す。私たちのコードは、https://github.com/jiangycTarheel/compositional-auxseqで公開されています。

関連論文リスト

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks [23.516986266146855]
我々は合成データ生成プロセスで自己回帰変換器モデルを訓練する。自己回帰変換器は少量のトレーニングデータから構成構造を学習できることを示す。
論文参考訳（メタデータ） (2023-11-21T21:16:54Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
最近提案されたDunangled sequence-to-sequence model (Dangle)は、有望な一般化能力を示している。このモデルに2つの重要な変更を加え、より不整合表現を奨励し、その計算とメモリ効率を改善する。具体的には、各タイミングでソースキーと値を適応的に再エンコードするのではなく、表現をアンタングルし、キーを定期的に再エンコードする。
論文参考訳（メタデータ） (2022-12-12T15:40:30Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
最近のデータセットは、標準的なシーケンス・ツー・シーケンスモデルにおける体系的な一般化能力の欠如を露呈している。本稿では,セq2seqモデルの振る舞いを分析し,相互排他バイアスの欠如と全例を記憶する傾向の2つの要因を同定する。広範に使用されている2つの構成性データセット上で、標準的なシーケンス・ツー・シーケンスモデルを用いて、経験的改善を示す。
論文参考訳（メタデータ） (2022-11-28T17:36:41Z)
Compositional Generalization and Decomposition in Neural Program Synthesis [59.356261137313275]
本稿では,学習プログラムシンセサイザーの合成一般化能力の測定に焦点をあてる。まず、プログラム合成法が一般化されるであろういくつかの異なる軸を特徴付ける。 2つの一般的な既存のデータセットに基づいて、これらの能力を評価するためのタスクのベンチマークスイートを導入する。
論文参考訳（メタデータ） (2022-04-07T22:16:05Z)
Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding [0.0]
本稿では,Seq2seqモデルをトレーニングおよび使用するための新しい手順であるRecursive Decodingを提案する。 1回のパスで出力シーケンス全体を生成するのではなく、モデルは一度に1つのトークンを予測するように訓練される。 RDは、gSCANの2つの以前に無視された一般化タスクに対して劇的な改善をもたらす。
論文参考訳（メタデータ） (2022-01-27T19:13:42Z)
Iterative Decoding for Compositional Generalization in Transformers [5.269770493488338]
シーケンシャル・ツー・シークエンス・ラーニングでは、トランスフォーマーは極端に長い例に対して正しい出力を予測できないことが多い。本稿では,Seq2seq学習に代わる反復復号法を提案する。反復復号により訓練されたトランスフォーマーはPCFGデータセット上でセq2seqよりも優れていることを示す。
論文参考訳（メタデータ） (2021-10-08T14:52:25Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
本稿では,分離可能な置換の辺りを正確に推定する効率的な動的プログラミングアルゴリズムを提案する。結果のSeq2seqモデルは、合成問題やNLPタスクの標準モデルよりも体系的な一般化が優れている。
論文参考訳（メタデータ） (2021-06-06T21:53:54Z)
Sequence-Level Mixed Sample Data Augmentation [119.94667752029143]
本研究は、シーケンス対シーケンス問題に対するニューラルモデルにおける合成行動を促進するための単純なデータ拡張手法を提案する。我々の手法であるSeqMixは、トレーニングセットから入力/出力シーケンスをソフトに結合することで、新しい合成例を作成する。
論文参考訳（メタデータ） (2020-11-18T02:18:04Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。