Fugu-MT 論文翻訳(概要): Transformers with Competitive Ensembles of Independent Mechanisms

論文の概要: Transformers with Competitive Ensembles of Independent Mechanisms

arxiv url: http://arxiv.org/abs/2103.00336v1
Date: Sat, 27 Feb 2021 21:48:46 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-06 00:30:29.481673
Title: Transformers with Competitive Ensembles of Independent Mechanisms
Title（参考訳）: 独立機構の競合アンサンブルを持つ変圧器
Authors: Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio
Abstract要約: 隠れた表現とパラメータを複数のメカニズムに分割し、注意を通して情報を交換する新しいトランスフォーマー層を提案する。 TIM を大規模 BERT モデル、画像変換器、および音声強調について研究し、意味的に意味のある専門化とパフォーマンスの向上の証拠を見つけます。
参考スコア（独自算出の注目度）: 97.93090139318294
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An important development in deep learning from the earliest MLPs has been a move towards architectures with structural inductive biases which enable the model to keep distinct sources of information and routes of processing well-separated. This structure is linked to the notion of independent mechanisms from the causality literature, in which a mechanism is able to retain the same processing as irrelevant aspects of the world are changed. For example, convnets enable separation over positions, while attention-based architectures (especially Transformers) learn which combination of positions to process dynamically. In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation. This potentially throws unrelated sources of information together, and limits the Transformer's ability to capture independent mechanisms. To address this, we propose Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. Additionally, we propose a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
Abstract（参考訳）: 初期のMLPから深層学習における重要な発展は、構造的帰納バイアスを持つアーキテクチャへの移行であり、モデルが異なる情報ソースと処理経路を適切に分離することを可能にする。この構造は、メカニズムが世界の無関係な側面が変化するのと同じ処理を保持することができる因果性文学からの独立したメカニズムの概念にリンクされています。例えば、convnetは位置を分離できるが、注意に基づくアーキテクチャ(特にトランスフォーマー)は動的に処理する位置の組み合わせを学習する。この研究では、トランスフォーマーアーキテクチャが不十分である方法を模索します。それは、大きなモノリシックな隠れ表現と、隠れた表現全体に適用される単一のパラメータセットで各位置を表現します。これは、無関係な情報ソースを一緒に投げる可能性があり、トランスフォーマーが独立したメカニズムをキャプチャする能力を制限する。そこで本研究では,隠れた表現とパラメータを複数の機構に分割し,注意を通してのみ情報を交換する,新しいトランスフォーマー層であるtimを用いたトランスフォーマーを提案する。さらに,これらのメカニズムが時間の経過とともに専門化され,より独立した競争機構も提案する。 TIM を大規模 BERT モデル、画像変換器、および音声強調について研究し、意味的に意味のある専門化とパフォーマンスの向上の証拠を見つけます。

関連論文リスト

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis [8.008567379796666]
Transformerアーキテクチャは、間違いなくディープラーニングに革命をもたらした。中心となる注意ブロックは、ディープラーニングにおける他のほとんどのアーキテクチャコンポーネントと形式と機能の違いです。これらの外向きの表現の背後にある根本原因と、それらを管理する正確なメカニズムは、まだ理解されていないままである。
論文参考訳（メタデータ） (2024-10-14T18:15:02Z)
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures [2.5322020135765464]
我々は、個々の物体の性質に関する感覚情報と、物体間の関係に関する関係情報とを区別する。本稿では,感性情報の流れを指示する感覚的注意機構と,関係情報の流れを指示する新たな関係的注意機構とを特徴とするトランスフォーマーフレームワークのアーキテクチャ拡張を提案する。
論文参考訳（メタデータ） (2024-05-26T23:52:51Z)
Compete and Compose: Learning Independent Mechanisms for Modular World Models [57.94106862271727]
異なる環境における再利用可能な独立したメカニズムを活用するモジュール型世界モデルであるCOMETを提案する。 COMETは、コンペティションとコンポジションという2段階のプロセスを通じて、動的に変化する複数の環境でトレーニングされている。 COMETは,従来のファインタニング手法に比べて,サンプル効率が向上し,多様なオブジェクト数で新しい環境に適応できることを示す。
論文参考訳（メタデータ） (2024-04-23T15:03:37Z)
Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks [19.574270595733502]
簡単なシーケンスモデリングタスクで訓練されたバニラアテンションのみのトランスフォーマー内で発生するメカニズムを解析する。トレーニングの結果,Transformer内の自己認識機構が,入力と出力のゲーティング機構を反映する方法で特化していることが判明した。
論文参考訳（メタデータ） (2024-02-13T04:28:43Z)
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling [10.246977481606427]
ドット積自己注意などのトランスフォーマーの異なる成分が表現力に影響を及ぼすメカニズムについて検討する。本研究では,トランスフォーマーにおける臨界パラメータの役割を明らかにする。
論文参考訳（メタデータ） (2024-02-01T11:43:13Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
本稿では,CNNからの詳細な空間情報を活用するためのハイブリッドフレームワークと,表現学習の強化を目的としたトランスフォーマーが提供するグローバルコンテキストを統合することを提案する。提案手法は、適応的なサンプリングとリカバリからなるエンドツーエンドの圧縮画像センシング手法である。実験により, 圧縮センシングにおける専用トランスアーキテクチャの有効性が示された。
論文参考訳（メタデータ） (2021-12-31T04:37:11Z)
Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning [79.4957965474334]
教師なし表現学習の主な目標は、データ生成プロセスが潜在プロパティを回復するために「反転」することである。この論文は「進化を支配するメカニズムの知識を活用して潜伏特性を識別するのか?」と問う。我々は、可能なメカニズムの集合に関する知識が異なるため、不特定性の原因の完全な特徴づけを提供する。
論文参考訳（メタデータ） (2021-10-29T14:04:08Z)
LocalViT: Analyzing Locality in Vision Transformers [101.53997555864822]
本稿では,視覚変換器における局所性メカニズムの影響について検討する。フィードフォワードネットワークに視覚変換器に局所性を加える。 ImageNet2012分類では、ローカリティ強化トランスフォーマーがベースラインを上回っている。
論文参考訳（メタデータ） (2021-04-12T17:59:22Z)
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers [78.26411729589526]
トランスフォーマーアーキテクチャによる予測を説明する最初の方法を提案する。本手法は,一様説明性に適応した既存手法よりも優れている。
論文参考訳（メタデータ） (2021-03-29T15:03:11Z)
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer [89.21584915290319]
本稿では,トランスフォーマー内の情報相互作用を解釈する自己帰属属性法を提案する。本研究は,BERT に対する非目標攻撃の実装において,その属性を敵対パターンとして用いることができることを示す。
論文参考訳（メタデータ） (2020-04-23T14:58:22Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。