Mixtraining: A Better Trade-Off Between Compute and Performance
- URL: http://arxiv.org/abs/2502.19513v2
- Date: Wed, 05 Mar 2025 03:40:47 GMT
- Title: Mixtraining: A Better Trade-Off Between Compute and Performance
- Authors: Zexin Li, Jiancheng Zhang, Yufei Li, Yinglun Zhu, Cong Liu,
- Abstract summary: MixTraining is a novel framework that interleaves several SSL and SL epochs within a unified mixtraining training phase.<n>It enhances synergy between SSL and SL for improved accuracy and consolidates shared steps to reduce computation overhead.<n>Experiments demonstrate that MixTraining offers a superior compute-performance trade-off compared to conventional pipelines.
- Score: 19.187160313686782
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Incorporating self-supervised learning (SSL) before standard supervised learning (SL) has become a widely used strategy to enhance model performance, particularly in data-limited scenarios. However, this approach introduces a trade-off between computation and performance: while SSL helps with representation learning, it requires a separate, often time-consuming training phase, increasing computational overhead and limiting efficiency in resource-constrained settings. To address these challenges, we propose MixTraining, a novel framework that interleaves several SSL and SL epochs within a unified mixtraining training phase, featuring a smooth transition between two learning objectives. MixTraining enhances synergy between SSL and SL for improved accuracy and consolidates shared computation steps to reduce computation overhead. MixTraining is versatile and applicable to both single-task and multi-task learning scenarios. Extensive experiments demonstrate that MixTraining offers a superior compute-performance trade-off compared to conventional pipelines, achieving an 8.81% absolute accuracy gain (18.89% relative accuracy gain) on the TinyImageNet dataset while accelerating training by up to 1.29x with the ViT-Tiny model.
Related papers
- Uni-Sign: Toward Unified Sign Language Understanding at Scale [90.76641997060513]
We propose a unified pre-training framework that eliminates the gap between pre-training and downstream SLU tasks.<n>Uni-Sign achieves state-of-the-art performance across multiple downstream SLU tasks.
arXiv Detail & Related papers (2025-01-25T11:51:23Z) - An Empirical Study of Accuracy-Robustness Tradeoff and Training Efficiency in Self-Supervised Learning [1.590683264892176]
Self-supervised learning (SSL) has significantly advanced image representation learning, yet efficiency challenges persist.<n>We revisit the robust EMP-SSL framework, emphasizing the importance of increasing the number of crops per image to accelerate learning.<n>We extend this approach with free adversarial training in Multi-Crop SSL, introducing the Cost-Free Adversarial Multi-Crop Self-Supervised Learning (CF-AMC-SSL)
arXiv Detail & Related papers (2025-01-07T03:50:11Z) - Self-supervised visual learning in the low-data regime: a comparative evaluation [38.34785825702943]
Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs)<n>It allows efficient representation learning from massive amounts of unlabeled training data.<n>It is not always feasible to collect and/or to utilize very large pretraining datasets.
arXiv Detail & Related papers (2024-04-26T07:23:14Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - On Pretraining Data Diversity for Self-Supervised Learning [57.91495006862553]
We explore the impact of training with more diverse datasets on the performance of self-supervised learning (SSL) under a fixed computational budget.
Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal.
arXiv Detail & Related papers (2024-03-20T17:59:58Z) - TiMix: Text-aware Image Mixing for Effective Vision-Language
Pre-training [42.142924806184425]
Mixed data samples for cross-modal contrastive learning implicitly serve as a regularizer for the contrastive loss.
TiMix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods.
arXiv Detail & Related papers (2023-12-14T12:02:24Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - Boosting Discriminative Visual Representation Learning with
Scenario-Agnostic Mixup [54.09898347820941]
We propose textbfScenario-textbfAgnostic textbfMixup (SAMix) for both Self-supervised Learning (SSL) and supervised learning (SL) scenarios.
Specifically, we hypothesize and verify the objective function of mixup generation as optimizing local smoothness between two mixed classes.
A label-free generation sub-network is designed, which effectively provides non-trivial mixup samples and improves transferable abilities.
arXiv Detail & Related papers (2021-11-30T14:49:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.