Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
- URL: http://arxiv.org/abs/2511.01952v1
- Date: Mon, 03 Nov 2025 13:16:30 GMT
- Title: Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
- Authors: Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, Tao Qi,
- Abstract summary: Large vision-language models (LVLMs) derive their capabilities from extensive training on vast corpora of visual and textual data.<n>We propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism.<n>Our method effectively identifies training data of LVLMs in a purely black-box setting and even achieves performance comparable to gray-box and white-box methods.
- Score: 25.68362027128315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large vision-language models (LVLMs) derive their capabilities from extensive training on vast corpora of visual and textual data. Empowered by large-scale parameters, these models often exhibit strong memorization of their training data, rendering them susceptible to membership inference attacks (MIAs). Existing MIA methods for LVLMs typically operate under white- or gray-box assumptions, by extracting likelihood-based features for the suspected data samples based on the target LVLMs. However, mainstream LVLMs generally only expose generated outputs while concealing internal computational features during inference, limiting the applicability of these methods. In this work, we propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism. The core idea is to assess the model memorization of the private semantic information embedded within the suspected image data, which is unlikely to be inferred from general world knowledge alone. We conducted extensive experiments across four LVLMs and three datasets. Empirical results demonstrate that our method effectively identifies training data of LVLMs in a purely black-box setting and even achieves performance comparable to gray-box and white-box methods. Further analysis reveals the robustness of our method against potential adversarial manipulations, and the effectiveness of the methodology designs. Our code and data are available at https://github.com/spmede/KCMP.
Related papers
- Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM [5.711910452650628]
Large Language Models (LLMs) have achieved remarkable success, however, the emergence of content generation distortion (hallucination) limits their practical applications.<n>This paper proposes LSCL (LLM-Supervised Confidence Learning), a deep learning-based method for expressing the knowledge boundaries of black-box LLMs.
arXiv Detail & Related papers (2026-02-11T12:42:59Z) - PerProb: Indirectly Evaluating Memorization in Large Language Models [13.905375956316632]
We propose PerProb, a label-free framework for indirectly assessing LLM vulnerabilities.<n>PerProb evaluates changes in perplexity and average log probability between data generated by victim and adversary models.<n>We evaluate PerProb's effectiveness across five datasets, revealing varying memory behaviors and privacy risks.
arXiv Detail & Related papers (2025-12-16T17:10:01Z) - OpenLVLM-MIA: A Controlled Benchmark Revealing the Limits of Membership Inference Attacks on Large Vision-Language Models [8.88331104584743]
OpenLVLM-MIA is a new benchmark that highlights fundamental challenges in evaluating membership inference attacks (MIA) against large vision-language models (LVLMs)<n>We introduce a controlled benchmark of 6,000 images where the distributions of member and non-member samples are carefully balanced, and ground-truth membership labels are provided across three distinct training stages.<n> Experiments using OpenLVLM-MIA demonstrated that the performance of state-of-the-art MIA methods converged to random chance under unbiased conditions.
arXiv Detail & Related papers (2025-10-18T01:39:28Z) - On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View [82.19096285469115]
Federated Learning (FL) enables training models across decentralized data silos while preserving client data privacy.<n>Recent research has explored efficient methods for post-training large language models (LLMs) within FL to address computational and communication challenges.<n>An inference-only paradigm (black-box FedLLM) has emerged to address these limitations.
arXiv Detail & Related papers (2025-08-22T09:52:31Z) - Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training [49.75298684433045]
BBoxER induces an information bottleneck via implicit compression of training data.<n>We provide non-vacuous generalization bounds and strong theoretical guarantees for differential privacy, to data poisoning attacks, and extraction attacks.
arXiv Detail & Related papers (2025-07-02T14:29:30Z) - Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models [27.04420374256226]
Large vision-language models (LVLMs) have demonstrated outstanding performance in many downstream tasks.<n>It is important to detect whether an image is used to train the LVLM.<n>Recent studies have investigated membership inference attacks (MIAs) against LVLMs.
arXiv Detail & Related papers (2025-06-14T04:22:36Z) - Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z) - Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods [1.9799527196428242]
Large language model unlearning aims to remove harmful information that LLMs have learnt to prevent their use for malicious purposes.<n>We show that unlearning has a notable impact on general model capabilities.<n>We show that doing 5-shot prompting or rephrasing the question in simple ways can lead to an over ten-fold increase in accuracy on unlearning benchmarks.
arXiv Detail & Related papers (2024-11-18T22:31:17Z) - Membership Inference Attacks against Large Vision-Language Models [40.996912464828696]
Large vision-language models (VLLMs) exhibit promising capabilities for processing multi-modal tasks across various application scenarios.
Their emergence also raises significant data security concerns, given the potential inclusion of sensitive information, such as private photos and medical records.
Detecting inappropriately used data in VLLMs remains a critical and unresolved issue.
arXiv Detail & Related papers (2024-11-05T08:35:08Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.<n>We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.<n>Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance [51.30560006045442]
Image-gRounded guIdaNcE (MARINE) is a framework that is both training-free and API-free.<n>MARINE effectively and efficiently reduces object hallucinations during inference by introducing image-grounded guidance to LVLMs.<n>Our framework's flexibility further allows for the integration of multiple vision models, enabling more reliable and robust object-level guidance.
arXiv Detail & Related papers (2024-02-13T18:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.