Related papers: Learning Mask Invariant Mutual Information for Masked Image Modeling

Learning Mask Invariant Mutual Information for Masked Image Modeling

URL: http://arxiv.org/abs/2502.19718v1
Date: Thu, 27 Feb 2025 03:19:05 GMT
Title: Learning Mask Invariant Mutual Information for Masked Image Modeling
Authors: Tao Huang, Yanxiang Ma, Shan You, Chang Xu,
Abstract summary: Maskedencodes (MAEs) represent a prominent self-supervised learning paradigm in computer vision.<n>Recent studies have attempted to elucidate the functioning of MAEs through contrastive learning and feature representation analysis.<n>We propose a new perspective for understanding MAEs by leveraging the information bottleneck principle in information theory.
Score: 35.63719638508299
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked autoencoders (MAEs) represent a prominent self-supervised learning paradigm in computer vision. Despite their empirical success, the underlying mechanisms of MAEs remain insufficiently understood. Recent studies have attempted to elucidate the functioning of MAEs through contrastive learning and feature representation analysis, yet these approaches often provide only implicit insights. In this paper, we propose a new perspective for understanding MAEs by leveraging the information bottleneck principle in information theory. Our theoretical analyses reveal that optimizing the latent features to balance relevant and irrelevant information is key to improving MAE performance. Building upon our proofs, we introduce MI-MAE, a novel method that optimizes MAEs through mutual information maximization and minimization. By enhancing latent features to retain maximal relevant information between them and the output, and minimizing irrelevant information between them and the input, our approach achieves better performance. Extensive experiments on standard benchmarks show that MI-MAE significantly outperforms MAE models in tasks such as image classification, object detection, and semantic segmentation. Our findings validate the theoretical framework and highlight the practical advantages of applying the information bottleneck principle to MAEs, offering deeper insights for developing more powerful self-supervised learning models.

Related papers

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction [17.44991827937427]
Masked Image Modeling techniques have redefined the landscape of computer vision. Despite their success, the full potential of MIM-based methods in dense prediction tasks, particularly in depth estimation, remains untapped. We propose SG-MIM, a novel Structured knowledge Guided Masked Image Modeling framework designed to enhance dense prediction tasks by utilizing structured knowledge alongside images.
arXiv Detail & Related papers (2024-09-04T08:24:53Z)
Understanding Masked Autoencoders From a Local Contrastive Perspective [80.57196495601826]
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. We introduce a new empirical framework, called Local Contrastive MAE, to analyze both reconstructive and contrastive aspects of MAE.
arXiv Detail & Related papers (2023-10-03T12:08:15Z)
Understanding Masked Autoencoders via Hierarchical Latent Variable Models [109.35382136147349]
Masked autoencoder (MAE) has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking.
arXiv Detail & Related papers (2023-06-08T03:00:10Z)
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL) MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z)
Tackling Visual Control via Multi-View Exploration Maximization [64.8463574294237]
MEM is the first approach that combines multi-view representation learning and reward-driven exploration in reinforcement learning (RL) We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games.
arXiv Detail & Related papers (2022-11-28T11:29:56Z)
i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable? [26.146459754995597]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training approach in the vision domain. This paper aims to explore an interactive Masked Autoencoders (i-MAE) framework to enhance the representation capability. In addition to qualitatively analyzing the characteristics of the latent representations, we examine the existence of linear separability and the degree of semantics in the latent space.
arXiv Detail & Related papers (2022-10-20T17:59:54Z)
MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective [57.19660234992812]
NER model has achieved promising performance on standard NER benchmarks. Recent studies show that previous approaches may over-rely on entity mention information, resulting in poor performance on out-of-vocabulary (OOV) entity recognition. We propose MINER, a novel NER learning framework, to remedy this issue from an information-theoretic perspective.
arXiv Detail & Related papers (2022-04-09T05:18:20Z)
Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z)
MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays. We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function. We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.