Batch Model Consolidation: A Multi-Task Model Consolidation Framework
- URL: http://arxiv.org/abs/2305.16484v1
- Date: Thu, 25 May 2023 21:33:56 GMT
- Title: Batch Model Consolidation: A Multi-Task Model Consolidation Framework
- Authors: Iordanis Fostiropoulos, Jiaye Zhu, Laurent Itti
- Abstract summary: In Continual Learning (CL), a model is required to learn a stream of tasks sequentially without significant performance degradation on previously learned tasks.
We propose Batch Model Consolidation ($textbfBMC$) to support more realistic CL under conditions where multiple agents are exposed to a range of tasks.
Our method outperforms the next best CL approach by 70% and is the only approach that can maintain performance at the end of 71 tasks.
- Score: 14.687385545898776
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In Continual Learning (CL), a model is required to learn a stream of tasks
sequentially without significant performance degradation on previously learned
tasks. Current approaches fail for a long sequence of tasks from diverse
domains and difficulties. Many of the existing CL approaches are difficult to
apply in practice due to excessive memory cost or training time, or are tightly
coupled to a single device. With the intuition derived from the widely applied
mini-batch training, we propose Batch Model Consolidation ($\textbf{BMC}$) to
support more realistic CL under conditions where multiple agents are exposed to
a range of tasks. During a $\textit{regularization}$ phase, BMC trains multiple
$\textit{expert models}$ in parallel on a set of disjoint tasks. Each expert
maintains weight similarity to a $\textit{base model}$ through a
$\textit{stability loss}$, and constructs a $\textit{buffer}$ from a fraction
of the task's data. During the $\textit{consolidation}$ phase, we combine the
learned knowledge on 'batches' of $\textit{expert models}$ using a
$\textit{batched consolidation loss}$ in $\textit{memory}$ data that aggregates
all buffers. We thoroughly evaluate each component of our method in an ablation
study and demonstrate the effectiveness on standardized benchmark datasets
Split-CIFAR-100, Tiny-ImageNet, and the Stream dataset composed of 71 image
classification tasks from diverse domains and difficulties. Our method
outperforms the next best CL approach by 70% and is the only approach that can
maintain performance at the end of 71 tasks; Our benchmark can be accessed at
https://github.com/fostiropoulos/stream_benchmark
Related papers
- Towards Efficient Automatic Self-Pruning of Large Language Models [55.90119819642064]
Post-training structured pruning is a promising solution that prunes Large Language Models without the need for retraining.
We argue that the key to mitigating this issue lies in accurately determining the pruning rate for each layer.
We introduce $textbfSelf-Pruner$ an end-to-end automatic self-pruning framework for LLMs, which efficiently search layer-wise pruning rates.
arXiv Detail & Related papers (2025-02-20T09:59:50Z) - Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models [77.79855507792564]
This paper revisits the implementation of $textbfL$oad-$textbfb$alancing $textbfL$oss (LBL) when training Mixture-of-Experts (MoEs) models.
arXiv Detail & Related papers (2025-01-21T04:04:39Z) - $M^3EL$: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking [11.334577756093923]
We propose a dataset construction pipeline and publish $M3EL$, a large-scale dataset for MEL.
$M3EL$ includes 79,625 instances, covering 9 diverse multi-modal tasks, and 5 different topics.
Our dataset effectively addresses these issues, and the $textitCLIP_textitND$ model fine-tuned with $M3EL$ shows a significant improvement in accuracy.
arXiv Detail & Related papers (2024-10-08T10:52:23Z) - In-Context Learning for Extreme Multi-Label Classification [29.627891261947536]
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone.
We propose a general program that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems.
Our solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples.
arXiv Detail & Related papers (2024-01-22T18:09:52Z) - Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation [43.32632163091792]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions.
Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks.
MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z) - Simplifying and Understanding State Space Models with Diagonal Linear
RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs.
We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs.
We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z) - Data-Centric Debugging: mitigating model failures via targeted data
collection [4.599792546344752]
Deep neural networks can be unreliable in the real world when the training set does not adequately cover all the settings where they are deployed.
We propose a general methodology for model debug that can systemically improve model performance on $mathcalE$ while maintaining its performance on the original test set.
arXiv Detail & Related papers (2022-11-17T19:44:02Z) - Cross-Modal Adapter for Text-Video Retrieval [91.9575196703281]
We present a novel $textbfCross-Modal Adapter$ for parameter-efficient fine-tuning.
Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers.
It achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets.
arXiv Detail & Related papers (2022-11-17T16:15:30Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning.
We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z) - IDEAL: Independent Domain Embedding Augmentation Learning [8.376337907951012]
We develop a novel mechanism, the independent domain embedding augmentation learning (I) method.
It can simultaneously learn multiple independent embedding spaces for transformations generated by multiple data domains.
Our ISOP is to existing DML techniques and can be seamlessly combined with prior DML approaches for enhanced performance.
arXiv Detail & Related papers (2021-05-21T03:40:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.