AEMIM: Adversarial Examples Meet Masked Image Modeling
- URL: http://arxiv.org/abs/2407.11537v1
- Date: Tue, 16 Jul 2024 09:39:13 GMT
- Title: AEMIM: Adversarial Examples Meet Masked Image Modeling
- Authors: Wenzhao Xiang, Chang Liu, Hang Su, Hongyang Yu,
- Abstract summary: We propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets.
In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images.
We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training.
- Score: 12.072673694665934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked image modeling (MIM) has gained significant traction for its remarkable prowess in representation learning. As an alternative to the traditional approach, the reconstruction from corrupted images has recently emerged as a promising pretext task. However, the regular corrupted images are generated using generic generators, often lacking relevance to the specific reconstruction task involved in pre-training. Hence, reconstruction from regular corrupted images cannot ensure the difficulty of the pretext task, potentially leading to a performance decline. Moreover, generating corrupted images might introduce an extra generator, resulting in a notable computational burden. To address these issues, we propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets. Adversarial examples, generated online using only the trained models, can directly aim to disrupt tasks associated with pre-training. Therefore, the incorporation not only elevates the level of challenge in reconstruction but also enhances efficiency, contributing to the acquisition of superior representations by the model. In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images. We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training. It is noted that our method is not restricted to specific model architectures and MIM strategies, rendering it an adaptable plug-in capable of enhancing all MIM methods. Experimental findings substantiate the remarkable capability of our approach in amplifying the generalization and robustness of existing MIM methods. Notably, our method surpasses the performance of baselines on various tasks, including ImageNet, its variants, and other downstream tasks.
Related papers
- MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Focus On What Matters: Separated Models For Visual-Based RL Generalization [16.87505461758058]
Separated Models for Generalization (SMG) is a novel approach that exploits image reconstruction for generalization.
SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios.
Experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings.
arXiv Detail & Related papers (2024-09-29T04:37:56Z) - Membership Inference Attack Against Masked Image Modeling [29.699606401861818]
Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition.
In this work, we take a different angle by studying the pre-training data privacy of MIM.
We propose the first membership inference attack against image encoders pre-trained by MIM.
arXiv Detail & Related papers (2024-08-13T11:34:28Z) - Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning [18.424840375721303]
Masked Image Modeling (MIM) has emerged as a promising method for deriving visual representations from unlabeled image data by predicting missing pixels from masked portions of images.
A promising yet unrealized framework is learning representations through masked reconstruction in latent space, combining the locality of MIM with the high-level targets.
This study is among the first to thoroughly analyze and address the challenges of such framework, which we refer to as Latent MIM.
arXiv Detail & Related papers (2024-07-22T17:54:41Z) - Double-Flow GAN model for the reconstruction of perceived faces from brain activities [13.707575848841405]
We proposed a novel reconstruction framework, which we called Double-Flow GAN.
We also designed a pretraining process that uses features extracted from images as conditions for making it possible to pretrain the conditional reconstruction model from fMRI.
Results showed that the proposed method is significant at accurately reconstructing multiple face attributes, outperforms the previous reconstruction models, and exhibited state-of-the-art reconstruction abilities.
arXiv Detail & Related papers (2023-12-12T18:07:57Z) - Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts [52.39959535724677]
We introduce an alternative solution to improve the generalization of image restoration models.
We propose AdaptIR, a Mixture-of-Experts (MoE) with multi-branch design to capture local, global, and channel representation bases.
Our AdaptIR achieves stable performance on single-degradation tasks, and excels in hybrid-degradation tasks, with fine-tuning only 0.6% parameters for 8 hours.
arXiv Detail & Related papers (2023-12-12T14:27:59Z) - PGDiff: Guiding Diffusion Models for Versatile Face Restoration via
Partial Guidance [65.5618804029422]
Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models.
We propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations.
Our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.
arXiv Detail & Related papers (2023-09-19T17:51:33Z) - Hard Patches Mining for Masked Image Modeling [52.46714618641274]
Masked image modeling (MIM) has attracted much research attention due to its promising potential for learning scalable visual representations.
We propose Hard Patches Mining (HPM), a brand-new framework for MIM pre-training.
arXiv Detail & Related papers (2023-04-12T15:38:23Z) - DPPMask: Masked Image Modeling with Determinantal Point Processes [49.65141962357528]
Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images.
We show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information.
To address this issue, we augment MIM with a new masking strategy namely the DPPMask.
Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks.
arXiv Detail & Related papers (2023-03-13T13:40:39Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.