Masked Modeling for Self-supervised Representation Learning on Vision
and Beyond
- URL: http://arxiv.org/abs/2401.00897v2
- Date: Tue, 9 Jan 2024 16:09:47 GMT
- Title: Masked Modeling for Self-supervised Representation Learning on Vision
and Beyond
- Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Di Wu, Lirong Wu, Zicheng Liu,
Jun Xia, Cheng Tan, Yang Liu, Baigui Sun, Stan Z. Li
- Abstract summary: Masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training.
We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more.
We conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research.
- Score: 69.64364187449773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the deep learning revolution marches on, self-supervised learning has
garnered increasing attention in recent years thanks to its remarkable
representation learning ability and the low dependence on labeled data. Among
these varied self-supervised techniques, masked modeling has emerged as a
distinctive approach that involves predicting parts of the original data that
are proportionally masked during training. This paradigm enables deep models to
learn robust representations and has demonstrated exceptional performance in
the context of computer vision, natural language processing, and other
modalities. In this survey, we present a comprehensive review of the masked
modeling framework and its methodology. We elaborate on the details of
techniques within masked modeling, including diverse masking strategies,
recovering targets, network architectures, and more. Then, we systematically
investigate its wide-ranging applications across domains. Furthermore, we also
explore the commonalities and differences between masked modeling methods in
different fields. Toward the end of this paper, we conclude by discussing the
limitations of current techniques and point out several potential avenues for
advancing masked modeling research. A paper list project with this survey is
available at \url{https://github.com/Lupin1998/Awesome-MIM}.
Related papers
- Masked Image Modeling Boosting Semi-Supervised Semantic Segmentation [38.55611683982936]
We introduce a novel class-wise masked image modeling that independently reconstructs different image regions according to their respective classes.
We develop a feature aggregation strategy that minimizes the distances between features corresponding to the masked and visible parts within the same class.
In semantic space, we explore the application of masked image modeling to enhance regularization.
arXiv Detail & Related papers (2024-11-13T16:42:07Z) - Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community.
There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z) - Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - Advances in Diffusion Models for Image Data Augmentation: A Review of Methods, Models, Evaluation Metrics and Future Research Directions [6.2719115566879236]
Diffusion Models (DMs) have emerged as a powerful tool for image data augmentation.
DMs generate realistic and diverse images by learning the underlying data distribution.
Current challenges and future research directions in the field are discussed.
arXiv Detail & Related papers (2024-07-04T18:06:48Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - MinT: Boosting Generalization in Mathematical Reasoning via Multi-View
Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs)
We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles.
Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Internal Representations of Vision Models Through the Lens of Frames on
Data Manifolds [8.67467876089153]
We present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold.
Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point.
Using neural frames, we make observations about the way that models process, layer-by-layer, specific modes of variation within a small neighborhood of a datapoint.
arXiv Detail & Related papers (2022-11-19T01:48:19Z) - Towards Interpretable Deep Reinforcement Learning Models via Inverse
Reinforcement Learning [27.841725567976315]
We propose a novel framework utilizing Adversarial Inverse Reinforcement Learning.
This framework provides global explanations for decisions made by a Reinforcement Learning model.
We capture intuitive tendencies that the model follows by summarizing the model's decision-making process.
arXiv Detail & Related papers (2022-03-30T17:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.