Related papers: Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

URL: http://arxiv.org/abs/2505.03519v3
Date: Sat, 24 May 2025 13:42:50 GMT
Title: Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment
Authors: Sy-Tuyen Ho, Koh Jun Hao, Ngoc-Bao Nguyen, Alexander Binder, Ngai-Man Cheung,
Abstract summary: Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T.<n>The standard evaluation framework for such attacks relies on an evaluation model E, trained under the same task design as T.<n>This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI attacks and defenses without question.
Score: 63.07424521895492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T. To evaluate such attacks, the standard evaluation framework for such attacks relies on an evaluation model E, trained under the same task design as T. This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI attacks and defenses without question. In this paper, we present the first in-depth study of this MI evaluation framework. In particular, we identify a critical issue of this standard MI evaluation framework: Type-I adversarial examples. These are reconstructions that do not capture the visual features of private training data, yet are still deemed successful by the target model T and ultimately transferable to E. Such false positives undermine the reliability of the standard MI evaluation framework. To address this issue, we introduce a new MI evaluation framework that replaces the evaluation model E with advanced Multimodal Large Language Models (MLLMs). By leveraging their general-purpose visual understanding, our MLLM-based framework does not depend on training of shared task design as in T, thus reducing Type-I transferability and providing more faithful assessments of reconstruction success. Using our MLLM-based evaluation framework, we reevaluate 26 diverse MI attack setups and empirically reveal consistently high false positive rates under the standard evaluation framework. Importantly, we demonstrate that many state-of-the-art (SOTA) MI methods report inflated attack accuracy, indicating that actual privacy leakage is significantly lower than previously believed. By uncovering this critical issue and proposing a robust solution, our work enables a reassessment of progress in MI research and sets a new standard for reliable and robust evaluation.

Related papers

SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark [22.8804507954023]
We propose a unified attack taxonomy and formal definitions of data reconstruction attacks.<n>We first propose a set of quantitative evaluation metrics that consider important criteria such as quantifiability, consistency, precision, and diversity.<n>We leverage large language models (LLMs) as a substitute for human judgment, enabling visual evaluation with an emphasis on high-quality reconstructions.
arXiv Detail & Related papers (2025-06-09T16:00:48Z)
A Sample-Level Evaluation and Generative Framework for Model Inversion Attacks [26.585927770608105]
Model Inversion (MI) attacks pose significant privacy concerns in machine learning.<n>Recent MI attacks have managed to reconstruct realistic label-level private data.<n>We show sample-level privacy, the private information of a single target sample, is also important but under-explored in the MI literature.
arXiv Detail & Related papers (2025-02-26T11:50:43Z)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset. We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z)
Model Inversion Attacks: A Survey of Approaches and Countermeasures [59.986922963781]
Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training. Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs. This survey aims to summarize up-to-date MIA methods in both attacks and defenses.
arXiv Detail & Related papers (2024-11-15T08:09:28Z)
MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and Defense [42.56467639172508]
Model Inversion (MI) attacks aim at leveraging the output information of target models to reconstruct privacy-sensitive training data.<n>We build the first practical benchmark named MIBench for systematic evaluation of model inversion attacks and defenses.
arXiv Detail & Related papers (2024-10-07T16:13:49Z)
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation [46.59416831869014]
We propose a novel evaluation framework referred to as StructEval. Starting from an atomic test objective, StructEval deepens and broadens the evaluation by conducting a structured assessment across multiple cognitive levels and critical concepts. Experiments on three widely-used benchmarks demonstrate that StructEval serves as a reliable tool for resisting the risk of data contamination.
arXiv Detail & Related papers (2024-08-06T16:28:30Z)
Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment. To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning [17.141646895576145]
Federated Learning (FL) is a promising approach for training machine learning models on decentralized data.<n>Membership Inference Attacks (MIAs) aim to determine whether a specific data point belongs to a target client's training set.<n>We introduce a three-step Membership Inference Attack (MIA) method, called FedMIA, which follows the "all for one"--leveraging updates from all clients across multiple communication rounds to enhance MIA effectiveness.
arXiv Detail & Related papers (2024-02-09T09:58:35Z)
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models [29.92550386563915]
Jailbreak attacks represent one of the most sophisticated threats to the security of large language models (LLMs)<n>We introduce an innovative framework that can help evaluate the effectiveness of jailbreak attacks on LLMs.<n>We present two distinct evaluation frameworks: a coarse-grained evaluation and a fine-grained evaluation.
arXiv Detail & Related papers (2024-01-17T06:42:44Z)
Don't Make Your LLM an Evaluation Benchmark Cheater [142.24553056600627]
Large language models(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity. To assess the model performance, a typical approach is to construct evaluation benchmarks for measuring the ability level of LLMs. We discuss the potential risk and impact of inappropriately using evaluation benchmarks and misleadingly interpreting the evaluation results.
arXiv Detail & Related papers (2023-11-03T14:59:54Z)
Learning Evaluation Models from Large Language Models for Sequence Generation [61.8421748792555]
We propose a three-stage evaluation model training method that utilizes large language models to generate labeled data for model-based metric development.<n> Experimental results on the SummEval benchmark demonstrate that CSEM can effectively train an evaluation model without human-labeled data.
arXiv Detail & Related papers (2023-08-08T16:41:16Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z)
A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications. They are vulnerable to adversarial examples, which motivates the adversarial defense. This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.