Fugu-MT 論文翻訳(概要): Causal Attention for Vision-Language Tasks

論文の概要: Causal Attention for Vision-Language Tasks

arxiv url: http://arxiv.org/abs/2103.03493v1
Date: Fri, 5 Mar 2021 06:38:25 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-08 19:36:08.251794
Title: Causal Attention for Vision-Language Tasks
Title（参考訳）: 視覚言語課題に対する因果注意
Authors: Xu Yang, Hanwang Zhang, Guojun Qi, Jianfei Cai
Abstract要約: 新しい注意機構:Causal Attention (CATT)について紹介する。 CATTは、既存の注目に基づく視覚言語モデルにおける絶え間ない欠点を除去する。特に,CATTは大規模プレトレーニングにおいて大きな可能性を秘めている。
参考スコア（独自算出の注目度）: 142.82608295995652
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We present a novel attention mechanism: Causal Attention (CATT), to remove the ever-elusive confounding effect in existing attention-based vision-language models. This effect causes harmful bias that misleads the attention module to focus on the spurious correlations in training data, damaging the model generalization. As the confounder is unobserved in general, we use the front-door adjustment to realize the causal intervention, which does not require any knowledge on the confounder. Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention. CATT abides by the Q-K-V convention and hence can replace any attention module such as top-down attention and self-attention in Transformers. CATT improves various popular attention-based vision-language models by considerable margins. In particular, we show that CATT has great potential in large-scale pre-training, e.g., it can promote the lighter LXMERT~\cite{tan2019lxmert}, which uses fewer data and less computational power, comparable to the heavier UNITER~\cite{chen2020uniter}. Code is published in \url{https://github.com/yangxuntu/catt}.
Abstract（参考訳）: 本稿では,既存の注意に基づく視覚言語モデルにおいて,因果注意 (Causal Attention, CATT) という新たな注意機構を提案する。この効果は有害なバイアスを引き起こし、アテンションモジュールはトレーニングデータの急激な相関に焦点を合わせ、モデルの一般化を損なう。共同設立者が一般的に観察されていないため、私たちはフロントドアの調整を使って因果的介入を実現します。具体的には,(1)IS-ATT(In-Sample Attention)と(2)CS-ATT(Cross-Sample Attention)の組み合わせとして,CATTが実施される。 CATTはQ-K-V規約に従属するため、トランスフォーマーにおけるトップダウンアテンションや自己アテンションなどのアテンションモジュールを置き換えることができる。 CATTは、様々な注目に基づく視覚言語モデルを大幅に改善する。特に、CATTは、より重いUNITER~\cite{chen2020uniter}に匹敵する少ないデータとより少ない計算能力を使用する軽量LXMERT~\cite{tan2019lxmert}を促進することができるなど、大規模な前訓練に大きな可能性を秘めていることを示しています。コードは \url{https://github.com/yangxuntu/catt} で公開される。

関連論文リスト

CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction [42.92011330807996]
$textitCTR-Sink$は、レコメンデーションシナリオに適した振る舞いレベルの注意シンクを導入した、新しいフレームワークである。注意シンク理論にヒントを得て、注意集中シンクを構築し、外部情報を介して注意集約を動的に制御する。
論文参考訳（メタデータ） (2025-08-05T17:30:34Z)
Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer [54.97718043685824]
Adamard Attention Recurrent Stereo Transformer(HART)について紹介する。 HARTには、以下のコンポーネントを組み込んだ新しいアテンションメカニズムが含まれている。反映的な領域では、HARTはKITTI 2012ベンチマークで1位にランクインした。
論文参考訳（メタデータ） (2025-01-02T02:51:16Z)
More Expressive Attention with Negative Weights [36.40344438470477]
本稿では,注意重みを否定的に表現力を高めるための新しい注意機構,Cog Attentionを提案する。我々のアプローチは、従来のソフトマックスの注意力の制約を再考し、壊すための有望な研究方向を示唆している。
論文参考訳（メタデータ） (2024-11-11T17:56:28Z)
Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models [64.67721492968941]
ゼロショットロバストネス(TGA-ZSR)のためのテキストガイド型アテンションを提案する。我々のゴールは、CLIPモデルの一般化を維持し、敵の堅牢性を高めることである。本手法は,現在の最先端技術よりも9.58%の精度でゼロショット精度を向上する。
論文参考訳（メタデータ） (2024-10-29T07:15:09Z)
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality [20.41579586967349]
MLLM(Multimodal Large Language Models)は、産業と学術の両方に焦点を合わせている。 MLLMは視覚や言語に先立って導入されたバイアスに悩まされ、多モード幻覚を引き起こすことがある。 MLLMに構造因果モデリングを適用した因果推論フレームワークCausalMMを提案する。
論文参考訳（メタデータ） (2024-10-07T06:45:22Z)
Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
我々は,アテンション機構がパッチベースの敵攻撃に弱いことを観察した。本稿では,意味的セグメンテーションモデルの堅牢性を改善するために,ロバスト注意機構(RAM)を提案する。
論文参考訳（メタデータ） (2024-01-03T13:58:35Z)
Guiding Visual Question Answering with Attention Priors [76.21671164766073]
本稿では,言語・視覚的接地による注意機構の導出について述べる。この基礎は、クエリ内の構造化言語概念を視覚オブジェクト間の参照物に接続することで導かれる。このアルゴリズムは、注意に基づく推論モデルを調べ、関連する連想的知識を注入し、コア推論プロセスを制御する。
論文参考訳（メタデータ） (2022-05-25T09:53:47Z)
A Context-Aware Feature Fusion Framework for Punctuation Restoration [28.38472792385083]
注意力不足を軽減するために,2種類の注意力(FFA)に基づく新しい特徴融合フレームワークを提案する。一般的なベンチマークデータセットであるIWSLTの実験は、我々のアプローチが効果的であることを示す。
論文参考訳（メタデータ） (2022-03-23T15:29:28Z)
Impact of Attention on Adversarial Robustness of Image Classification Models [0.9176056742068814]
ディープラーニングモデルに対するアドリアック攻撃が注目されている。近年の研究では、これらの攻撃からモデルを守るための敵の例や技法の存在について説明がされている。この研究は、相手の強靭性に対する注意の影響の一般的な理解を目的としている。
論文参考訳（メタデータ） (2021-09-02T13:26:32Z)
Causal Attention for Unbiased Visual Recognition [76.87114090435618]
注意モジュールは、どんなコンテキストにおいても堅牢な因果的特徴を深層モデルで学ぶのに役立つとは限らない。本稿では,コーカサリ・アテンション・モジュール(CaaM)を提案する。 OOD設定では、CaaMによるディープモデルは、それなしではパフォーマンスが大幅に向上する。
論文参考訳（メタデータ） (2021-08-19T16:45:51Z)
More Than Just Attention: Learning Cross-Modal Attentions with Contrastive Constraints [63.08768589044052]
本稿では,コントラストコンテンツリソーシング (CCR) とコントラストコンテンツスワッピング (CCS) の制約を提案する。 CCRとCCSの制約は、明示的な注意アノテーションを必要とせず、対照的な学習方法で注意モデルのトレーニングを監督する。 Flickr30kとMS-COCOのデータセットの実験は、これらの注意制約を2つの最先端の注意ベースモデルに統合することで、モデルのパフォーマンスが向上することを示した。
論文参考訳（メタデータ） (2021-05-20T08:48:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。