Membership Inference Attack Against Masked Image Modeling
- URL: http://arxiv.org/abs/2408.06825v1
- Date: Tue, 13 Aug 2024 11:34:28 GMT
- Title: Membership Inference Attack Against Masked Image Modeling
- Authors: Zheng Li, Xinlei He, Ning Yu, Yang Zhang,
- Abstract summary: Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition.
In this work, we take a different angle by studying the pre-training data privacy of MIM.
We propose the first membership inference attack against image encoders pre-trained by MIM.
- Score: 29.699606401861818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition. The image encoder pre-trained through MIM, involving the masking and subsequent reconstruction of input images, attains state-of-the-art performance in various downstream vision tasks. However, most existing works focus on improving the performance of MIM.In this work, we take a different angle by studying the pre-training data privacy of MIM. Specifically, we propose the first membership inference attack against image encoders pre-trained by MIM, which aims to determine whether an image is part of the MIM pre-training dataset. The key design is to simulate the pre-training paradigm of MIM, i.e., image masking and subsequent reconstruction, and then obtain reconstruction errors. These reconstruction errors can serve as membership signals for achieving attack goals, as the encoder is more capable of reconstructing the input image in its training set with lower errors. Extensive evaluations are conducted on three model architectures and three benchmark datasets. Empirical results show that our attack outperforms baseline methods. Additionally, we undertake intricate ablation studies to analyze multiple factors that could influence the performance of the attack.
Related papers
- AEMIM: Adversarial Examples Meet Masked Image Modeling [12.072673694665934]
We propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets.
In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images.
We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training.
arXiv Detail & Related papers (2024-07-16T09:39:13Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - VQAttack: Transferable Adversarial Attacks on Visual Question Answering
via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules.
Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - MimCo: Masked Image Modeling Pre-training with Contrastive Teacher [14.413674270588023]
Masked image modeling (MIM) has received much attention in self-supervised learning (SSL)
visualizations show that the learned representations are less separable, especially compared to those based on contrastive learning pre-training.
We propose a novel and flexible pre-training framework, named MimCo, which combines MIM and contrastive learning through two-stage pre-training.
arXiv Detail & Related papers (2022-09-07T10:59:05Z) - Improvements to Self-Supervised Representation Learning for Masked Image
Modeling [0.0]
This paper explores improvements to the masked image modeling (MIM) paradigm.
The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part.
We propose a new model, Contrastive Masked AutoEncoders (CMAE)
arXiv Detail & Related papers (2022-05-21T09:45:50Z) - Beyond Masking: Demystifying Token-Based Pre-Training for Vision
Transformers [122.01591448013977]
Masked image modeling (MIM) has demonstrated promising results on downstream tasks.
In this paper, we investigate whether there exist other effective ways to learn by recovering missing contents'
We summarize a few design principles for token-based pre-training of vision transformers.
This design achieves superior performance over MIM in a series of downstream recognition tasks without extra computational cost.
arXiv Detail & Related papers (2022-03-27T14:23:29Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.