DailyMAE: Towards Pretraining Masked Autoencoders in One Day
- URL: http://arxiv.org/abs/2404.00509v1
- Date: Sun, 31 Mar 2024 00:59:10 GMT
- Title: DailyMAE: Towards Pretraining Masked Autoencoders in One Day
- Authors: Jiantao Wu, Shentong Mo, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais,
- Abstract summary: Masked image modeling (MIM) has drawn attention for its effectiveness in learning data representation from unlabeled data.
In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks.
Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours.
- Score: 37.206816999538496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining pose significant challenges, particularly within academic environments, thereby impeding the SSL research progress. In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks and employing progressive training techniques and other tricks to closely maintain pretraining performance. Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours, using a single machine equipped with 8 A100 GPUs. By achieving speed gains of up to 5.8 times, this work not only demonstrates the feasibility of conducting high-efficiency SSL training but also paves the way for broader accessibility and promotes advancement in SSL research particularly for prototyping and initial testing of SSL ideas. The code is available in https://github.com/erow/FastSSL.
Related papers
- Label-Efficient Sleep Staging Using Transformers Pre-trained with Position Prediction [2.591936982899312]
We propose an architecture that seamlessly couples the feature and temporal encoding and a suitable pretraining scheme that pretrains the entire model.
On a sample sleep staging dataset, we find that the proposed scheme offers performance gains that do not saturate with amount of labeled training data.
arXiv Detail & Related papers (2024-03-29T23:22:30Z) - On Pretraining Data Diversity for Self-Supervised Learning [57.91495006862553]
We explore the impact of training with more diverse datasets on the performance of self-supervised learning (SSL) under a fixed computational budget.
Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal.
arXiv Detail & Related papers (2024-03-20T17:59:58Z) - Self-supervised learning for skin cancer diagnosis with limited training data [0.196629787330046]
Self-supervised learning (SSL) is an alternative to the standard supervised pre-training on ImageNet data for scenarios with limited training data.
We find that minimal further SSL pre-training on task-specific data can be as effective as large-scale SSL pre-training on ImageNet for medical image classification tasks with limited labelled data.
arXiv Detail & Related papers (2024-01-01T08:11:38Z) - Joint Prediction and Denoising for Large-scale Multilingual
Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups.
We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages.
We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z) - CroSSL: Cross-modal Self-Supervised Learning for Time-series through
Latent Masking [11.616031590118014]
CroSSL allows for handling missing modalities and end-to-end cross-modal learning.
We evaluate our method on a wide range of data, including motion sensors.
arXiv Detail & Related papers (2023-07-31T17:10:10Z) - Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
Learning in Speech and Audio [19.865050806327147]
Self-supervised learning has proven vital in speech and audio-related applications.
This paper provides the first empirical study of SSL pre-training for different specified sequence lengths.
We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.
arXiv Detail & Related papers (2022-09-30T16:35:42Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining
Paradigms [36.04356511882304]
Self-supervised learning (SSL) has demonstrated promising results on a wide range of applications.
There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.
arXiv Detail & Related papers (2020-06-19T05:21:00Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.