Fugu-MT 論文翻訳(概要): SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

論文の概要: SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

arxiv url: http://arxiv.org/abs/2508.09105v2
Date: Wed, 13 Aug 2025 11:05:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-14 14:06:00.584106
Title: SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling
Title（参考訳）: SMA:誰がそう言ったのか? 半ブラックボックスのRAG制御でメンバーシップの漏洩を調査中
Authors: Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao,
Abstract要約: Retrieval-Augmented Generation(RAG)とそのMultimodal Retrieval-Augmented Generation(MRAG)は、大規模言語モデル(LLM)の知識カバレッジと文脈理解を著しく向上させる。しかし、検索とマルチモーダル融合によるコンテンツの曖昧さは、既存のメンバーシップ推論手法を事前学習、外部検索、ユーザ入力に確実に属性付けできないようにし、プライバシー漏洩の説明責任を損なう。本稿では,検索制御機能を備えた半ブラックボックス設定において,生成したコンテンツの微粒なソース属性を実現するための,SMA (Source-aware Membership Audit) を提案する。
参考スコア（独自算出の注目度）: 50.66950115630554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-Augmented Generation (MRAG) significantly improve the knowledge coverage and contextual understanding of Large Language Models (LLMs) by introducing external knowledge sources. However, retrieval and multimodal fusion obscure content provenance, rendering existing membership inference methods unable to reliably attribute generated outputs to pre-training, external retrieval, or user input, thus undermining privacy leakage accountability To address these challenges, we propose the first Source-aware Membership Audit (SMA) that enables fine-grained source attribution of generated content in a semi-black-box setting with retrieval control capabilities. To address the environmental constraints of semi-black-box auditing, we further design an attribution estimation mechanism based on zero-order optimization, which robustly approximates the true influence of input tokens on the output through large-scale perturbation sampling and ridge regression modeling. In addition, SMA introduces a cross-modal attribution technique that projects image inputs into textual descriptions via MLLMs, enabling token-level attribution in the text modality, which for the first time facilitates membership inference on image retrieval traces in MRAG systems. This work shifts the focus of membership inference from 'whether the data has been memorized' to 'where the content is sourced from', offering a novel perspective for auditing data provenance in complex generative systems.
Abstract（参考訳）: Retrieval-Augmented Generation(RAG)とそのMultimodal Retrieval-Augmented Generation(MRAG)は、外部知識源を導入することにより、Large Language Models(LLM)の知識カバレッジとコンテキスト理解を大幅に改善する。しかし,検索とマルチモーダル融合によるコンテンツ発見,既存のメンバシップ推論手法を事前学習,外部検索,ユーザ入力に確実に属性付けできないようにすることで,プライバシリークのアカウンタビリティを損なうことにより,これらの課題に対処するために,検索制御機能付き半ブラックボックス設定で生成されたコンテンツの微粒なソース帰属を可能にする,SMA(Source-aware Membership Audit)を提案する。半ブラックボックス監査の環境制約に対処するため,大規模摂動サンプリングとリッジ回帰モデリングにより入力トークンが出力に与える影響を頑健に近似するゼロオーダー最適化に基づく帰属推定機構を設計する。さらに、SMAは、MLLMを通じて画像入力をテキスト記述に投影するクロスモーダル属性技術を導入し、MRAGシステムにおける画像検索トレースのメンバシップ推論を初めて促進する、テキストモダリティにおけるトークンレベルの属性を可能にする。この研究は、メンバーシップ推論の焦点を「データが記憶されているかどうか」から「コンテンツが発信されている場所」にシフトさせ、複雑な生成系におけるデータの証明を監査するための新しい視点を提供する。

論文の概要: SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

関連論文リスト