IBCL: Zero-shot Model Generation for Task Trade-offs in Continual
Learning
- URL: http://arxiv.org/abs/2310.02995v2
- Date: Thu, 5 Oct 2023 17:58:37 GMT
- Title: IBCL: Zero-shot Model Generation for Task Trade-offs in Continual
Learning
- Authors: Pengyuan Lu and Michele Caprio and Eric Eaton and Insup Lee
- Abstract summary: We propose Imprecise Bayesian Continual Learning (IBCL) to address task trade-off preferences.
IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base.
IBCL improves average per-task accuracy by at most 23% and peak per-task accuracy by at most 15% with respect to the baseline methods.
- Score: 15.77524891010002
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Like generic multi-task learning, continual learning has the nature of
multi-objective optimization, and therefore faces a trade-off between the
performance of different tasks. That is, to optimize for the current task
distribution, it may need to compromise performance on some previous tasks.
This means that there exist multiple models that are Pareto-optimal at
different times, each addressing a distinct task performance trade-off.
Researchers have discussed how to train particular models to address specific
trade-off preferences. However, existing algorithms require training overheads
proportional to the number of preferences -- a large burden when there are
multiple, possibly infinitely many, preferences. As a response, we propose
Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates
a knowledge base in the form of a convex hull of model parameter distributions
and (2) obtains particular models to address task trade-off preferences with
zero-shot. That is, IBCL does not require any additional training overhead to
generate preference-addressing models from its knowledge base. We show that
models obtained by IBCL have guarantees in identifying the Pareto optimal
parameters. Moreover, experiments on standard image classification and NLP
tasks support this guarantee. Statistically, IBCL improves average per-task
accuracy by at most 23\% and peak per-task accuracy by at most 15\% with
respect to the baseline methods, with steadily near-zero or positive backward
transfer. Most importantly, IBCL significantly reduces the training overhead
from training 1 model per preference to at most 3 models for all preferences.
Related papers
- On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge.
We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task.
Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework.
It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets.
It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z) - Exploring Transferability for Randomized Smoothing [37.60675615521106]
We propose a method for pretraining certifiably robust models.
We find that surprisingly strong certified accuracy can be achieved even when finetuning on only clean images.
arXiv Detail & Related papers (2023-12-14T15:08:27Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian
Continual Learning [19.11678487931003]
We propose Imprecise Bayesian Continual Learning (IBCL) to address preferences on task-performance trade-offs.
IBCL does not require any additional training overhead to construct preference-addressing models from its knowledge base.
We show that models obtained by IBCL have guarantees in identifying the preferred parameters.
arXiv Detail & Related papers (2023-05-24T06:39:00Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.