Step Out and Seek Around: On Warm-Start Training with Incremental Data
- URL: http://arxiv.org/abs/2406.04484v1
- Date: Thu, 6 Jun 2024 20:12:55 GMT
- Title: Step Out and Seek Around: On Warm-Start Training with Incremental Data
- Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jose M. Alvarez,
- Abstract summary: Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving.
Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning.
We propose Knowledge Consolidation and Acquisition (CKCA), a continuous model improvement algorithm with two novel components.
- Score: 28.85668076145673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to significant training costs. Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning. However, existing literature suggests that this warm-starting degrades generalization. In this paper, we advocate for warm-starting but stepping out of the previous converging point, thus allowing a better adaptation to new data without compromising previous knowledge. We propose Knowledge Consolidation and Acquisition (CKCA), a continuous model improvement algorithm with two novel components. First, a novel feature regularization (FeatReg) to retain and refine knowledge from existing checkpoints; Second, we propose adaptive knowledge distillation (AdaKD), a novel approach to forget mitigation and knowledge transfer. We tested our method on ImageNet using multiple splits of the training data. Our approach achieves up to $8.39\%$ higher top1 accuracy than the vanilla warm-starting and consistently outperforms the prior art with a large margin.
Related papers
- Why pre-training is beneficial for downstream classification tasks? [32.331679393303446]
We propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view.
Specifically, we extract and quantify the knowledge encoded by the pre-trained model.
We discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.
arXiv Detail & Related papers (2024-10-11T02:13:16Z) - Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA)
C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge.
Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z) - Adaptively Integrated Knowledge Distillation and Prediction Uncertainty
for Continual Learning [71.43841235954453]
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge.
Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
arXiv Detail & Related papers (2023-01-18T05:36:06Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Informed Pre-Training on Prior Knowledge [6.666503127282259]
When training data is scarce, the incorporation of additional prior knowledge can assist the learning process.
In this paper, we propose a novel informed machine learning approach and suggest to pre-train on prior knowledge.
arXiv Detail & Related papers (2022-05-23T16:24:40Z) - EXPANSE: A Deep Continual / Progressive Learning System for Deep
Transfer Learning [1.1024591739346294]
Current DTL techniques suffer from either catastrophic forgetting dilemma or overly biased pre-trained models.
We propose a new continual/progressive learning approach for deep transfer learning to tackle these limitations.
We offer a new way of training deep learning models inspired by the human education system.
arXiv Detail & Related papers (2022-05-19T03:54:58Z) - Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation
For Action Recognition [8.571437792425417]
We propose a novel transfer learning approach that combines self-distillation in fine-tuning to preserve knowledge from the pre-trained model learned from the large-scale dataset.
Specifically, we fix the encoder from the last epoch as the teacher model to guide the training of the encoder from the current epoch in the transfer learning.
arXiv Detail & Related papers (2022-05-01T16:31:25Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Always Be Dreaming: A New Approach for Data-Free Class-Incremental
Learning [73.24988226158497]
We consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL)
We propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation.
Our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks.
arXiv Detail & Related papers (2021-06-17T17:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.