Related papers: Batch Model Consolidation: A Multi-Task Model Consolidation Framework

Batch Model Consolidation: A Multi-Task Model Consolidation Framework

URL: http://arxiv.org/abs/2305.16484v1
Date: Thu, 25 May 2023 21:33:56 GMT
Title: Batch Model Consolidation: A Multi-Task Model Consolidation Framework
Authors: Iordanis Fostiropoulos, Jiaye Zhu, Laurent Itti
Abstract summary: In Continual Learning (CL), a model is required to learn a stream of tasks sequentially without significant performance degradation on previously learned tasks. We propose Batch Model Consolidation ($textbfBMC$) to support more realistic CL under conditions where multiple agents are exposed to a range of tasks. Our method outperforms the next best CL approach by 70% and is the only approach that can maintain performance at the end of 71 tasks.
Score: 14.687385545898776
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In Continual Learning (CL), a model is required to learn a stream of tasks sequentially without significant performance degradation on previously learned tasks. Current approaches fail for a long sequence of tasks from diverse domains and difficulties. Many of the existing CL approaches are difficult to apply in practice due to excessive memory cost or training time, or are tightly coupled to a single device. With the intuition derived from the widely applied mini-batch training, we propose Batch Model Consolidation ($\textbf{BMC}$) to support more realistic CL under conditions where multiple agents are exposed to a range of tasks. During a $\textit{regularization}$ phase, BMC trains multiple $\textit{expert models}$ in parallel on a set of disjoint tasks. Each expert maintains weight similarity to a $\textit{base model}$ through a $\textit{stability loss}$, and constructs a $\textit{buffer}$ from a fraction of the task's data. During the $\textit{consolidation}$ phase, we combine the learned knowledge on 'batches' of $\textit{expert models}$ using a $\textit{batched consolidation loss}$ in $\textit{memory}$ data that aggregates all buffers. We thoroughly evaluate each component of our method in an ablation study and demonstrate the effectiveness on standardized benchmark datasets Split-CIFAR-100, Tiny-ImageNet, and the Stream dataset composed of 71 image classification tasks from diverse domains and difficulties. Our method outperforms the next best CL approach by 70% and is the only approach that can maintain performance at the end of 71 tasks; Our benchmark can be accessed at https://github.com/fostiropoulos/stream_benchmark

Related papers

Towards Efficient Automatic Self-Pruning of Large Language Models [55.90119819642064]
Post-training structured pruning is a promising solution that prunes Large Language Models without the need for retraining. We argue that the key to mitigating this issue lies in accurately determining the pruning rate for each layer. We introduce $textbfSelf-Pruner$ an end-to-end automatic self-pruning framework for LLMs, which efficiently search layer-wise pruning rates.
arXiv Detail & Related papers (2025-02-20T09:59:50Z)
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models [77.79855507792564]
This paper revisits the implementation of $textbfL$oad-$textbfb$alancing $textbfL$oss (LBL) when training Mixture-of-Experts (MoEs) models.
arXiv Detail & Related papers (2025-01-21T04:04:39Z)
$M^3EL$: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking [11.334577756093923]
We propose a dataset construction pipeline and publish $M3EL$, a large-scale dataset for MEL. $M3EL$ includes 79,625 instances, covering 9 diverse multi-modal tasks, and 5 different topics. Our dataset effectively addresses these issues, and the $textitCLIP_textitND$ model fine-tuned with $M3EL$ shows a significant improvement in accuracy.
arXiv Detail & Related papers (2024-10-08T10:52:23Z)
In-Context Learning for Extreme Multi-Label Classification [29.627891261947536]
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone. We propose a general program that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. Our solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples.
arXiv Detail & Related papers (2024-01-22T18:09:52Z)
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation [45.90925587972781]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z)
FAMO: Fast Adaptive Multitask Optimization [48.59232177073481]
We introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way. Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques.
arXiv Detail & Related papers (2023-06-06T15:39:54Z)
Simplifying and Understanding State Space Models with Diagonal Linear RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs. We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs. We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z)
Data-Centric Debugging: mitigating model failures via targeted data collection [4.599792546344752]
Deep neural networks can be unreliable in the real world when the training set does not adequately cover all the settings where they are deployed. We propose a general methodology for model debug that can systemically improve model performance on $mathcalE$ while maintaining its performance on the original test set.
arXiv Detail & Related papers (2022-11-17T19:44:02Z)
Cross-Modal Adapter for Text-Video Retrieval [91.9575196703281]
We present a novel $textbfCross-Modal Adapter$ for parameter-efficient fine-tuning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. It achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets.
arXiv Detail & Related papers (2022-11-17T16:15:30Z)
DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z)
vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark. vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning. We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z)
IDEAL: Independent Domain Embedding Augmentation Learning [8.376337907951012]
We develop a novel mechanism, the independent domain embedding augmentation learning (I) method. It can simultaneously learn multiple independent embedding spaces for transformations generated by multiple data domains. Our ISOP is to existing DML techniques and can be seamlessly combined with prior DML approaches for enhanced performance.
arXiv Detail & Related papers (2021-05-21T03:40:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.