Fugu-MT 論文翻訳(概要): Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM

論文の概要: Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM

arxiv url: http://arxiv.org/abs/2505.23828v1
Date: Wed, 28 May 2025 07:44:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-02 19:47:52.544524
Title: Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Title（参考訳）: Spa-VLM:RAGベースのVLM攻撃
Authors: Lei Yu, Yechao Zhang, Ziqi Zhou, Yang Wu, Wei Wan, Minghui Li, Shengshan Hu, Pei Xiaobing, Jing Wang,
Abstract要約: 本稿では,大規模モデルに対する新たな毒殺パラダイムであるSpa-VLM(Stealthy Poisoning Attack on RAG-based VLM)を提案する。我々は、敵対的な画像や誤解を招くテキストを含む悪意のあるマルチモーダルな知識エントリを作成し、RAGの知識ベースに注入する。以上の結果から,攻撃成功率は0.8。
参考スコア（独自算出の注目度）: 23.316684225491002
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the rapid development of the Vision-Language Model (VLM), significant progress has been made in Visual Question Answering (VQA) tasks. However, existing VLM often generate inaccurate answers due to a lack of up-to-date knowledge. To address this issue, recent research has introduced Retrieval-Augmented Generation (RAG) techniques, commonly used in Large Language Models (LLM), into VLM, incorporating external multi-modal knowledge to enhance the accuracy and practicality of VLM systems. Nevertheless, the RAG in LLM may be susceptible to data poisoning attacks. RAG-based VLM may also face the threat of this attack. This paper first reveals the vulnerabilities of the RAG-based large model under poisoning attack, showing that existing single-modal RAG poisoning attacks have a 100\% failure rate in multi-modal RAG scenarios. To address this gap, we propose Spa-VLM (Stealthy Poisoning Attack on RAG-based VLM), a new paradigm for poisoning attacks on large models. We carefully craft malicious multi-modal knowledge entries, including adversarial images and misleading text, which are then injected into the RAG's knowledge base. When users access the VLM service, the system may generate misleading outputs. We evaluate Spa-VLM on two Wikipedia datasets and across two different RAGs. Results demonstrate that our method achieves highly stealthy poisoning, with the attack success rate exceeding 0.8 after injecting just 5 malicious entries into knowledge bases with 100K and 2M entries, outperforming state-of-the-art poisoning attacks designed for RAG-based LLMs. Additionally, we evaluated several defense mechanisms, all of which ultimately proved ineffective against Spa-VLM, underscoring the effectiveness and robustness of our attack.
Abstract（参考訳）: VLM(Vision-Language Model)の急速な開発に伴い、視覚質問応答(VQA)タスクにおいて大きな進歩が見られた。しかしながら、既存のVLMは、最新の知識が欠如しているため、しばしば不正確な答えを生じる。この問題に対処するため、近年の研究では、Large Language Models (LLM) で一般的に使われているRetrieval-Augmented Generation (RAG) 技術をVLMに導入し、VLMシステムの精度と実用性を高めるために、外部のマルチモーダル知識を取り入れている。それでも、LSMのRAGはデータ中毒の攻撃を受けやすい。 RAGベースのVLMもこの攻撃の脅威に直面する可能性がある。筆者らはまず,既存の単一モードのRAG中毒攻撃がマルチモードのRAGシナリオにおいて100倍の障害率を有することを示す。このギャップに対処するため,大規模モデルに対する新たな毒殺パラダイムであるSpa-VLM(Stealthy Poisoning Attack on RAG-based VLM)を提案する。我々は、敵対的な画像や誤解を招くテキストを含む悪意のあるマルチモーダルな知識エントリを慎重に作成し、RAGの知識ベースに注入する。ユーザがVLMサービスにアクセスすると、システムは誤解を招く出力を生成する。本研究では,2つのウィキペディアデータセットと2つの異なるRAGに対して,Spa-VLMを評価した。その結果,攻撃成功率は100Kおよび2Mの知識ベースに5つの悪意あるエントリを注入すると0.8以上となり,RAGベースのLSM向けに設計された最先端の中毒攻撃よりも高い結果が得られた。さらに,Spa-VLMに対する防御機構の評価を行い,攻撃の有効性とロバスト性について検討した。

関連論文リスト

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought [58.321044666612174]
Vad-R1は、ビデオ異常推論のためのエンドツーエンドのMLLMベースのフレームワークである。我々は、異常を認識する人間の過程をシミュレートするパーセプション・トゥ・コグニション・チェーン・オブ・ワット(P2C-CoT)を設計する。また,MLLMの異常推論能力を明示的に動機付ける改良型強化学習アルゴリズムAVA-GRPOを提案する。
論文参考訳（メタデータ） (2025-05-26T12:05:16Z)
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation [12.573766276297441]
Retrieval-Augmented Generation (RAG) は、推論中に外部知識を取り入れることで、大規模言語モデルにおける幻覚の緩和に有効であることが証明されている。我々は、RAGに対する中毒攻撃を評価するための、最初の包括的なベンチマークフレームワークを提案する。
論文参考訳（メタデータ） (2025-05-24T06:17:59Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
マルチモーダル検索強化世代(RAG)は視覚言語モデル(VLM)の視覚的推論能力を向上させる本研究では,マルチモーダルRAGシステムに対する最初の知識中毒攻撃であるtextitPoisoned-MRAGを紹介する。
論文参考訳（メタデータ） (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Retrieval Augmented Generation (RAG) を備えたマルチモーダル大言語モデル(MLLM) RAGはクエリ関連外部知識の応答を基盤としてMLLMを強化する。この依存は、知識中毒攻撃(英語版)という、危険だが未発見の安全リスクを生じさせる。本稿では,2つの攻撃戦略を持つ新しい知識中毒攻撃フレームワークMM-PoisonRAGを提案する。
論文参考訳（メタデータ） (2025-02-25T04:23:59Z)
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis [3.706288937295861]
RevPRAGは、LLMの活性化を利用した、柔軟で自動化された検出パイプラインである。複数のベンチマークデータセットとRAGアーキテクチャによる結果から,提案手法は真正の98%,偽正の1%に近い正の98%を達成できた。
論文参考訳（メタデータ） (2024-11-28T06:29:46Z)
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
LVLM(Large Vision-Language Models)は、多モーダルな理解と推論タスクにまたがる顕著な能力を示す。 LVLMの脆弱性は比較的過小評価されており、日々の使用において潜在的なセキュリティリスクを生じさせる。本稿では,既存のLVLM攻撃の様々な形態について概説する。
論文参考訳（メタデータ） (2024-07-10T06:57:58Z)
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models [45.409248316497674]
大規模言語モデル(LLM)は、その例外的な生成能力により、顕著な成功を収めた。 Retrieval-Augmented Generation (RAG)は、これらの制限を緩和するための最先端技術である。 RAGシステムにおける知識データベースは,新たな,実用的な攻撃面を導入している。この攻撃面に基づいて,RAGに対する最初の知識汚職攻撃であるPoisonedRAGを提案する。
論文参考訳（メタデータ） (2024-02-12T18:28:36Z)
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks [10.732558183444985]
悪意のあるアクターは、望ましくない出力を生成することを目的とした中毒攻撃を通じて、大きな言語モデル(LLM)の脆弱性を隠蔽的に利用することができる。本報告では, 様々な生成タスクにおいて, その有効性を評価するために, 様々な中毒技術について検討する。本研究は, 微調整段階において, 全チューニングデータサンプルの1%程度を用いてLSMに毒を盛ることが可能であることを示す。
論文参考訳（メタデータ） (2023-12-07T23:26:06Z)
On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
大規模視覚言語モデル(VLM)のロバスト性を,最も現実的で高リスクな環境で評価する。特に,CLIP や BLIP などの事前学習モデルに対して,まず攻撃対象のサンプルを作成する。これらのVLM上のブラックボックスクエリは、ターゲットの回避の効果をさらに向上させることができる。
論文参考訳（メタデータ） (2023-05-26T13:49:44Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。