Fugu-MT 論文翻訳(概要): Multimodal Learning with Transformers: A Survey

論文の概要: Multimodal Learning with Transformers: A Survey

arxiv url: http://arxiv.org/abs/2206.06488v2
Date: Wed, 10 May 2023 02:11:30 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-11 17:45:50.259296
Title: Multimodal Learning with Transformers: A Survey
Title（参考訳）: トランスフォーマーを用いたマルチモーダルラーニング:サーベイ
Authors: Peng Xu, Xiatian Zhu, and David A. Clifton
Abstract要約: Transformerは有望なニューラルネットワーク学習者であり、さまざまな機械学習タスクで大きな成功を収めている。近年のマルチモーダルアプリケーションとビッグデータの普及により、トランスフォーマーベースのマルチモーダル学習はAI研究においてホットなトピックとなっている。
参考スコア（独自算出の注目度）: 43.71023129374107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.
Abstract（参考訳）: Transformerは有望なニューラルネットワーク学習者であり、さまざまな機械学習タスクで大きな成功を収めている。近年のマルチモーダルアプリケーションとビッグデータの普及により、トランスフォーマーベースのマルチモーダル学習はAI研究においてホットなトピックとなっている。本稿では,マルチモーダルデータ指向の変圧器技術に関する包括的調査を行う。 The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.

関連論文リスト

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation [69.34266162474836]
本稿では,マルチモーダル理解と生成を統一する単一トランスフォーマーを構築するための,効率的なトレーニングパラダイムについて検討する。機能事前スケーリングとマルチモーダルAdaLN技術を導入し、クロスモーダル互換性の課題に対処する。本稿では,新しいマルチモーダルトランスであるHaplo Omniを紹介する。
論文参考訳（メタデータ） (2025-06-03T15:14:00Z)
3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction [4.342241136871849]
本稿では,マルチモーダルトランスフォーマーを用いたマルチパースペクティブデータのターンテイク予測手法を提案する。最近導入されたEgoComデータセットの実験結果は、平均して14.01%の大幅なパフォーマンス向上を示している。
論文参考訳（メタデータ） (2023-10-23T12:29:10Z)
Exchanging-based Multimodal Fusion with Transformer [19.398692598523454]
本稿では,マルチモーダル核融合の問題点について考察する。近年,あるモダリティから他のモダリティへ学習した埋め込みを交換することを目的としたビジョン・ビジョン・フュージョンのための交換方式が提案されている。本稿では,Transformer を用いたテキストビジョン融合のための交換型マルチモーダル融合モデル MuSE を提案する。
論文参考訳（メタデータ） (2023-09-05T12:48:25Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformerは、シーケンシャルデータ内のコンテキスト関係を理解するために自己認識メカニズムを使用するディープニューラルネットワークである。 Transformerモデルは、入力シーケンス要素間の長い依存関係を処理し、並列処理を可能にする。我々の調査では、トランスフォーマーベースのモデルのためのトップ5のアプリケーションドメインを特定します。
論文参考訳（メタデータ） (2023-06-11T23:13:51Z)
Multilevel Transformer For Multimodal Emotion Recognition [6.0149102420697025]
本稿では,微粒化表現と事前学習した発話レベル表現を組み合わせた新しい多粒度フレームワークを提案する。本研究では,Transformer TTSにインスパイアされたマルチレベルトランスフォーマーモデルを提案する。
論文参考訳（メタデータ） (2022-10-26T10:31:24Z)
Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos [58.93586436289648]
マルチモーダル感情分析のためのマルチスケール協調型マルチモーダルトランス (MCMulT) アーキテクチャを提案する。本モデルは,非整合型マルチモーダル列に対する既存手法よりも優れ,整合型マルチモーダル列に対する強い性能を有する。
論文参考訳（メタデータ） (2022-06-16T07:47:57Z)
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion [112.27103169303184]
マルチモーダル知識グラフ(MKG)は、視覚テキストの事実知識を整理する。 MKGformerは、マルチモーダルリンク予測、マルチモーダルRE、マルチモーダルNERの4つのデータセット上でSOTA性能を得ることができる。
論文参考訳（メタデータ） (2022-05-04T23:40:04Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
タスクに適合する1つのアーキテクチャを設計し、汎用的なマルチエージェント強化学習パイプラインを最初に試行する。従来のRNNモデルとは異なり、トランスフォーマーモデルを用いてフレキシブルなポリシーを生成する。提案方式はUPDeT(Universal Policy Decoupling Transformer)と名付けられ,動作制限を緩和し,マルチエージェントタスクの決定プロセスをより説明しやすいものにする。
論文参考訳（メタデータ） (2021-01-20T07:24:24Z)
Transformers in Vision: A Survey [101.07348618962111]
トランスフォーマーは、入力シーケンス要素間の長い依存関係をモデリングし、シーケンスの並列処理をサポートします。変圧器は設計に最小限の誘導バイアスを必要とし、自然にセット関数として適しています。本調査は,コンピュータビジョン分野におけるトランスフォーマーモデルの概要を概観することを目的としている。
論文参考訳（メタデータ） (2021-01-04T18:57:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。