MAGMA: Manifold Regularization for MAEs
- URL: http://arxiv.org/abs/2412.02871v2
- Date: Thu, 05 Dec 2024 19:12:43 GMT
- Title: MAGMA: Manifold Regularization for MAEs
- Authors: Alin Dondera, Anuj Singh, Hadi Jamali-Rad,
- Abstract summary: Masked Autoencoders (MAEs) are an important divide in self-supervised learning (SSL)
We introduce MAGMA, a novel batch-wide layer-wise regularization loss applied to representations of different Transformer layers.
We demonstrate that by plugging in the proposed regularization loss, one can significantly improve the performance of MAE-based models.
- Score: 1.7478203318226315
- License:
- Abstract: Masked Autoencoders (MAEs) are an important divide in self-supervised learning (SSL) due to their independence from augmentation techniques for generating positive (and/or negative) pairs as in contrastive frameworks. Their masking and reconstruction strategy also nicely aligns with SSL approaches in natural language processing. Most MAEs are built upon Transformer-based architectures where visual features are not regularized as opposed to their convolutional neural network (CNN) based counterparts, which can potentially hinder their performance. To address this, we introduce MAGMA, a novel batch-wide layer-wise regularization loss applied to representations of different Transformer layers. We demonstrate that by plugging in the proposed regularization loss, one can significantly improve the performance of MAE-based models. We further demonstrate the impact of the proposed loss on optimizing other generic SSL approaches (such as VICReg and SimCLR), broadening the impact of the proposed approach. Our code base can be found at https://github.com/adondera/magma.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked
Auto-Encoder [61.7834263332332]
We develop Masked Auto-Encoder (MAE) as a unified, modality-agnostic SSL framework.
We argue meta-learning as a key to interpreting MAE as a modality-agnostic learner.
Our experiment demonstrates the superiority of MetaMAE in the modality-agnostic SSL benchmark.
arXiv Detail & Related papers (2023-10-25T03:03:34Z) - CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model [22.870512676002463]
This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators.
Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs.
Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
arXiv Detail & Related papers (2023-10-24T03:08:58Z) - AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity
Matching using Adapter-tuning [3.4754314910585626]
We propose a parameter-efficient paradigm for fine-tuning PrLMs based on adapters.
We show that our solution achieves comparable or superior performance to full-scale PrLM fine-tuning and prompt-tuning baselines.
arXiv Detail & Related papers (2023-05-30T04:03:23Z) - Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining.
A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery.
Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z) - Adapting Self-Supervised Vision Transformers by Probing
Attention-Conditioned Masking Consistency [7.940705941237998]
We propose PACMAC, a simple two-stage adaptation algorithm for self-supervised ViTs.
Our simple approach leads to consistent performance gains over competing methods.
arXiv Detail & Related papers (2022-06-16T14:46:10Z) - A Reinforcement Learning Approach for Sequential Spatial Transformer
Networks [6.585049648605185]
We formulate the task as a Markovian Decision Process (MDP) and use RL to solve this sequential decision-making problem.
In our method, we are not bound to the differentiability of the sampling modules.
We design multiple experiments to verify the effectiveness of our method using cluttered MNIST and Fashion-MNIST datasets.
arXiv Detail & Related papers (2021-06-27T17:41:17Z) - Efficient Semantic Image Synthesis via Class-Adaptive Normalization [116.63715955932174]
Class-adaptive normalization (CLADE) is a lightweight but equally-effective variant that is only adaptive to semantic class.
We introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE.
The proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE.
arXiv Detail & Related papers (2020-12-08T18:59:32Z) - Modal Regression based Structured Low-rank Matrix Recovery for
Multi-view Learning [70.57193072829288]
Low-rank Multi-view Subspace Learning has shown great potential in cross-view classification in recent years.
Existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously.
We propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy.
arXiv Detail & Related papers (2020-03-22T03:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.