Fugu-MT 論文翻訳(概要): Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

論文の概要: Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

arxiv url: http://arxiv.org/abs/2203.09397v1
Date: Thu, 17 Mar 2022 15:46:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-18 14:46:12.706688
Title: Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
Title（参考訳）: ブランクスレートの色付け:事前学習はシーケンス・ツー・シーケンスモデルに階層的誘導バイアスを与える
Authors: Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster
Abstract要約: シークエンス・ツー・シークエンス(seq2seq)モデルは、構文変換を行う際に階層性に敏感な方法で一般化できないことが多い。事前学習されたSeq2seqモデルは、構文変換を行う際に階層的に一般化するが、構文変換でスクラッチから訓練されたモデルはそうではない。
参考スコア（独自算出の注目度）: 23.21767225871304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations - for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models have only observed models that were not pre-trained on natural language data before being trained to perform syntactic transformations, in spite of the fact that pre-training has been found to induce hierarchical linguistic generalizations in language models; in other words, the syntactic capabilities of seq2seq models may have been greatly understated. We address this gap using the pre-trained seq2seq models T5 and BART, as well as their multilingual variants mT5 and mBART. We evaluate whether they generalize hierarchically on two transformations in two languages: question formation and passivization in English and German. We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not. This result presents evidence for the learnability of hierarchical syntactic information from non-annotated natural language text while also demonstrating that seq2seq models are capable of syntactic generalization, though only after exposure to much more language data than human learners receive.
Abstract（参考訳）: 単語間の関係は線形順序付けよりも階層構造によって支配される。 sequence-to-sequence(seq2seq)モデルは、下流のnlpアプリケーションでの成功にもかかわらず、構文変換を行う場合、階層的に一般化できないことが多い。しかし、セク2セックモデルの構文評価は、セク2セックモデルの構文的能力は、言語モデルにおいて階層的な言語的一般化を誘導する事前学習が発見されているにもかかわらず、構文的変換を行うために訓練される前に自然言語データで事前訓練されていないモデルのみを観察している。事前訓練されたセク2セックモデルT5とBARTと、その多言語変種mT5とmBARTを用いて、このギャップに対処する。 2つの言語における2つの変換(英語とドイツ語の質問形成と受動的化)を階層的に一般化するかどうかを評価する。予備学習されたseq2seqモデルは、構文変換を行う際に階層的に一般化するが、構文変換のスクラッチからトレーニングされたモデルは一般化しない。この結果は,非注釈の自然言語テキストから階層的構文情報の学習可能性を示すとともに,Seq2seqモデルは,人間の学習者が受けるよりもはるかに多くの言語データに曝露した後にのみ,構文的一般化が可能であることを証明している。

関連論文リスト

Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
自然言語データに基づいて訓練されたトランスフォーマーは、その階層構造を学習し、目に見えない構文構造を持つ文に一般化することが示されている。本研究では,変圧器モデルにおける帰納バイアスの発生源と,そのような一般化行動を引き起こす可能性のあるトレーニングについて検討する。
論文参考訳（メタデータ） (2024-04-25T07:10:29Z)
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases [28.58785395946639]
事前学習は、微調整後にタスクを実行する際に、階層的な構文的特徴に依存するように言語モデルを教えることができることを示す。アーキテクチャの特徴(深さ、幅、パラメータ数)と、事前学習コーパスのジャンルとサイズに焦点を当てる。
論文参考訳（メタデータ） (2023-05-31T14:38:14Z)
Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
本稿では、学習中の帰納バイアスの源として階層的フレーズを取り入れ、推論中の明示的な制約として、標準的なシーケンス・ツー・シーケンス(seq2seq)モデルの柔軟性を維持するニューラルトランスデューサについて述べる。本手法では,木が原文と対象句を階層的に整列するブラケット文法に基づく識別的導出法と,整列した句を1対1で翻訳するニューラルネットワークセク2セックモデルという2つのモデルを訓練する。
論文参考訳（メタデータ） (2022-11-15T05:22:40Z)
Structural generalization is hard for sequence-to-sequence models [85.0087839979613]
シーケンス・ツー・シーケンス(seq2seq)モデルは、多くのNLPタスクで成功している。構成一般化に関する最近の研究は、セq2seqモデルは訓練で見られなかった言語構造への一般化において非常に低い精度を達成することを示した。
論文参考訳（メタデータ） (2022-10-24T09:03:03Z)
Compositional Generalization Requires Compositional Parsers [69.77216620997305]
直近のCOGSコーパスにおける構成原理によって導かれるシーケンス・ツー・シーケンスモデルとモデルを比較した。構造一般化は構成一般化の重要な尺度であり、複雑な構造を認識するモデルを必要とする。
論文参考訳（メタデータ） (2022-02-24T07:36:35Z)
Transformers Generalize Linearly [1.7709450506466664]
変換器のシーケンス・ツー・シーケンスモデルにおける構造一般化のパターンについて検討する。変換器が多種多様な文法マッピングタスクを階層的に一般化するのに失敗するだけでなく、線形一般化の方が同等のネットワークよりも強い傾向を示す。
論文参考訳（メタデータ） (2021-09-24T15:48:46Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
本稿では,分離可能な置換の辺りを正確に推定する効率的な動的プログラミングアルゴリズムを提案する。結果のSeq2seqモデルは、合成問題やNLPタスクの標準モデルよりも体系的な一般化が優れている。
論文参考訳（メタデータ） (2021-06-06T21:53:54Z)
Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models [47.42249565529833]
人間は最小限の経験から単語に関する構造的特性を学ぶことができる。我々は、現代のニューラル言語モデルがこの行動を英語で再現する能力を評価する。
論文参考訳（メタデータ） (2020-10-12T14:12:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。