Fugu-MT 論文翻訳(概要): OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization

論文の概要: OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization

arxiv url: http://arxiv.org/abs/2312.04403v1
Date: Thu, 7 Dec 2023 16:16:50 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-08 14:20:42.721161
Title: OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Title（参考訳）: OT-Attack:最適輸送最適化による視覚言語モデルの逆変換性向上
Authors: Dongchen Han, Xiaojun Jia, Yang Bai, Jindong Gu, Yang Liu, and Xiaochun Cao
Abstract要約: 視覚言語事前学習モデルは、マルチモーダル対逆例に対して脆弱である。近年の研究では、データ拡張と画像-テキストのモーダル相互作用を活用することで、対向的な例の転送可能性を高めることが示されている。本稿では,OT-Attack と呼ばれる最適輸送方式の敵攻撃を提案する。
参考スコア（独自算出の注目度）: 65.57380193070574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language pre-training (VLP) models demonstrate impressive abilities in processing both images and text. However, they are vulnerable to multi-modal adversarial examples (AEs). Investigating the generation of high-transferability adversarial examples is crucial for uncovering VLP models' vulnerabilities in practical scenarios. Recent works have indicated that leveraging data augmentation and image-text modal interactions can enhance the transferability of adversarial examples for VLP models significantly. However, they do not consider the optimal alignment problem between dataaugmented image-text pairs. This oversight leads to adversarial examples that are overly tailored to the source model, thus limiting improvements in transferability. In our research, we first explore the interplay between image sets produced through data augmentation and their corresponding text sets. We find that augmented image samples can align optimally with certain texts while exhibiting less relevance to others. Motivated by this, we propose an Optimal Transport-based Adversarial Attack, dubbed OT-Attack. The proposed method formulates the features of image and text sets as two distinct distributions and employs optimal transport theory to determine the most efficient mapping between them. This optimal mapping informs our generation of adversarial examples to effectively counteract the overfitting issues. Extensive experiments across various network architectures and datasets in image-text matching tasks reveal that our OT-Attack outperforms existing state-of-the-art methods in terms of adversarial transferability.
Abstract（参考訳）: 視覚言語事前学習(VLP)モデルは、画像とテキストの両方を処理できる優れた能力を示している。しかし、それらはマルチモーダル対逆例(AE)に弱い。 VLPモデルの脆弱性を現実のシナリオで発見するためには、高透過性逆例の生成を調査することが不可欠である。近年の研究では、データ拡張と画像-テキストのモーダル相互作用を活用することで、VLPモデルの逆例の転送可能性を大幅に向上させることができることが示されている。しかし,画像テキスト対の最適アライメント問題は考慮されていない。この監視は、ソースモデルに過度に適合した敵の例をもたらし、転送可能性の改善を制限します。本研究ではまず,データ拡張による画像集合と対応するテキスト集合との相互作用について検討する。拡張画像サンプルは、特定のテキストと最適に整合できるが、他のテキストとの関連性は低い。そこで我々は,OT-Attack と呼ばれる最適輸送方式の敵攻撃を提案する。提案手法は,画像とテキスト集合の特徴を2つの異なる分布として定式化し,最適なトランスポート理論を用いてそれらの最効率的なマッピングを決定する。この最適マッピングは、オーバーフィット問題に効果的に対処するために、我々の敵の例を生成する。画像テキストマッチングタスクにおける様々なネットワークアーキテクチャとデータセットにわたる広範囲な実験により、我々のot攻撃は、敵対的転送可能性の観点から、既存の最先端メソッドよりも優れています。

論文の概要: OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization

関連論文リスト