BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling
- URL: http://arxiv.org/abs/2311.17218v1
- Date: Tue, 28 Nov 2023 20:42:30 GMT
- Title: BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling
- Authors: Yixuan Luo, Mengye Ren, Sai Qian Zhang
- Abstract summary: Masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN)
- Score: 18.861945284506028
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Like masked language modeling (MLM) in natural language processing, masked
image modeling (MIM) aims to extract valuable insights from image patches to
enhance the feature extraction capabilities of the underlying deep neural
network (DNN). Contrasted with other training paradigms like supervised
learning and unsupervised contrastive learning, masked image modeling (MIM)
pretraining typically demands significant computational resources in order to
manage large training data batches (e.g., 4096). The significant memory and
computation requirements pose a considerable challenge to its broad adoption.
To mitigate this, we introduce a novel learning framework,
termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves
decomposing the MIM tasks into several sub-tasks with independent computation
patterns, resulting in block-wise back-propagation operations instead of the
traditional end-to-end approach. Our proposed BIM maintains superior
performance compared to conventional MIM while greatly reducing peak memory
consumption. Moreover, BIM naturally enables the concurrent training of
numerous DNN backbones of varying depths. This leads to the creation of
multiple trained DNN backbones, each tailored to different hardware platforms
with distinct computing capabilities. This approach significantly reduces
computational costs in comparison with training each DNN backbone individually.
Our framework offers a promising solution for resource constrained training of
MIM.
Related papers
- Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset [66.15872913664407]
This study introduces textbfRS-4M, a large-scale dataset designed to enable highly efficient MIM training on RS images.
We propose an efficient MIM method, termed textbfSelectiveMAE, which dynamically encodes and reconstructs a subset of patch tokens selected based on their semantic richness.
Experiments show that SelectiveMAE significantly boosts training efficiency by 2.2-2.7 times and enhances the classification, detection, and segmentation performance of the baseline MIM model.
arXiv Detail & Related papers (2024-06-17T15:41:57Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - Deep learning enhanced mixed integer optimization: Learning to reduce model dimensionality [0.0]
This work introduces a framework to address the computational complexity inherent in Mixed-Integer Programming.
By employing deep learning, we construct problem-specific models that identify and exploit common structures across MIP instances.
We present an algorithm for generating synthetic data enhancing the robustness and generalizability of our models.
arXiv Detail & Related papers (2024-01-17T19:15:13Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining [37.2106265998237]
We propose an effective learning procedure named Meta Fine-Tuning (MFT)
MFT serves as a meta-learner to solve a group of similar NLP tasks for neural language models.
We implement MFT upon BERT to solve several multi-domain text mining tasks.
arXiv Detail & Related papers (2020-03-29T11:27:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.