Fugu-MT 論文翻訳(概要): Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

論文の概要: Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

arxiv url: http://arxiv.org/abs/2306.04539v1
Date: Wed, 7 Jun 2023 15:44:53 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-08 13:32:14.983648
Title: Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications
Title（参考訳）: ラベル付きマルチモーダルデータを持たないマルチモーダル学習:保証と応用
Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alex Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract要約: 複数のモーダルから共同で学習する多くの機械学習システムでは、マルチモーダル相互作用の性質を理解することが中心的な研究課題である。我々は,この相互作用定量化の課題について,ラベル付き単調データのみを用いた半教師付き環境で検討する。相互作用の正確な情報理論的定義を用いて、我々の重要な貢献は、マルチモーダル相互作用の量を定量化するための下界と上界の導出である。
参考スコア（独自算出の注目度）: 97.79283975518047
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: the emergence of new task-relevant information during learning from both modalities that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurring multimodal data (e.g., unlabeled images and captions, video and corresponding audio) but when labeling them is time-consuming. Using a precise information-theoretic definition of interactions, our key contributions are the derivations of lower and upper bounds to quantify the amount of multimodal interactions in this semi-supervised setting. We propose two lower bounds based on the amount of shared information between modalities and the disagreement between separately trained unimodal classifiers, and derive an upper bound through connections to approximate algorithms for min-entropy couplings. We validate these estimated bounds and show how they accurately track true interactions. Finally, two semi-supervised multimodal applications are explored based on these theoretical results: (1) analyzing the relationship between multimodal performance and estimated interactions, and (2) self-supervised learning that embraces disagreement between modalities beyond agreement as is typically done.
Abstract（参考訳）: 複数のモーダルから共同で学習する多くの機械学習システムにおいて、中心となる研究課題はマルチモーダル相互作用の性質を理解することである。ラベル付き一助データのみと自然に共起するマルチモーダルデータ(例えば、ラベル付き画像やキャプション、ビデオ、対応するオーディオ)を用いた半教師付き環境でのインタラクション定量化の課題について検討する。相互作用の正確な情報理論的定義を用いて、この半教師付き環境でのマルチモーダル相互作用の量を定量化するための下界と上界の導出である。モーダル性間の共有情報量と個別に訓練された単項分類器間の不一致量に基づいて2つの下界を提案し、最小エントロピー結合の近似アルゴリズムに接続を通して上界を導出する。これらの推定境界を検証し、実際の相互作用を正確に追跡する方法を示す。最後に,2つの半教師付きマルチモーダル・アプリケーションについて,(1)マルチモーダル性能と推定相互作用の関係を解析し,(2)合意を超えるモダリティの相違を考慮した自己教師付き学習を行った。

関連論文リスト

MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping [28.653290360671175]
我々は,マルチモーダルインタラクションのタイプに基づいた,シンプルながら驚くほど効果的なタスクグループ化戦略であるMINTを紹介する。提案手法は,マルチモーダル命令チューニングにおいて,既存のタスクグループ化ベースラインを大幅に上回ることを示す。
論文参考訳（メタデータ） (2025-06-02T22:55:23Z)
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification [3.6616868775630587]
マルチモーダルデータに固有のニュアンス付き共有関係を抽出するマルチモーダル混合コントラスト学習手法であるM3CoLを提案する。我々の研究は、堅牢なマルチモーダル学習のための共有関係の学習の重要性を強調し、将来の研究に有望な道を開く。
論文参考訳（メタデータ） (2024-09-26T12:15:13Z)
What to align in multimodal contrastive learning? [7.7439394183358745]
単一マルチモーダル空間におけるモダリティ間の通信を可能にするコントラスト型マルチモーダル学習戦略を導入する。この定式化から,情報共有,相乗的,一意的な用語が自然に出現し,冗長性を超えたマルチモーダル相互作用を推定できることを示す。後者では、CoMMは複雑なマルチモーダル相互作用を学び、6つのマルチモーダルベンチマークで最先端の結果を得る。
論文参考訳（メタデータ） (2024-09-11T16:42:22Z)
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding [62.70450216120704]
教師なしの事前訓練は骨格に基づく行動理解において大きな成功を収めた。我々はUmURLと呼ばれる統一マルチモーダル非教師なし表現学習フレームワークを提案する。 UmURLは効率的な早期融合戦略を利用して、マルチモーダル機能を単一ストリームで共同でエンコードする。
論文参考訳（メタデータ） (2023-11-06T13:56:57Z)
Learning Unseen Modality Interaction [54.23533023883659]
マルチモーダル学習は、すべてのモダリティの組み合わせが訓練中に利用でき、クロスモーダル対応を学ぶことを前提としている。我々は、目に見えないモダリティ相互作用の問題を提起し、第1の解を導入する。異なるモジュラリティの多次元的特徴を、豊富な情報を保存した共通空間に投影するモジュールを利用する。
論文参考訳（メタデータ） (2023-06-22T10:53:10Z)
Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
マルチモーダル要約の目標は、異なるモーダルから最も重要な情報を抽出し、出力要約を形成することである。既存の手法では、異なるモダリティ間の時間的対応の活用に失敗し、異なるサンプル間の本質的な相関を無視する。 A2Summ(Align and Attend Multimodal Summarization)は、マルチモーダル入力を効果的に整列し、参加できる統一型マルチモーダルトランスフォーマーモデルである。
論文参考訳（メタデータ） (2023-03-13T17:01:42Z)
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework [89.8609061423685]
本稿では,入力モーダル性と出力タスクを関連付けた冗長性,特異性,シナジーの度合いを定量化する情報理論手法を提案する。 PID推定を検証するために、PIDが知られている合成データセットと大規模マルチモーダルベンチマークの両方で広範な実験を行う。本研究では,(1)マルチモーダルデータセット内の相互作用の定量化,(2)マルチモーダルモデルで捉えた相互作用の定量化,(3)モデル選択の原理的アプローチ,(4)実世界のケーススタディの3つにその有用性を示す。
論文参考訳（メタデータ） (2023-02-23T18:59:05Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
生成モデル学習のための対照的なフレームワークを開発し、モダリティ間の共通性だけでなく、「関連」と「関連しない」マルチモーダルデータの区別によってモデルを訓練することができる。提案手法では, 生成モデルを用いて, 関係のないサンプルから関連サンプルを正確に識別し, ラベルのない多モードデータの利用が可能となる。
論文参考訳（メタデータ） (2020-07-02T15:08:11Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。