Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian
Continual Learning
- URL: http://arxiv.org/abs/2305.14782v1
- Date: Wed, 24 May 2023 06:39:00 GMT
- Title: Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian
Continual Learning
- Authors: Pengyuan Lu and Michele Caprio and Eric Eaton and Insup Lee
- Abstract summary: We propose Imprecise Bayesian Continual Learning (IBCL) to address preferences on task-performance trade-offs.
IBCL does not require any additional training overhead to construct preference-addressing models from its knowledge base.
We show that models obtained by IBCL have guarantees in identifying the preferred parameters.
- Score: 19.11678487931003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Like generic multi-task learning, continual learning has the nature of
multi-objective optimization, and therefore faces a trade-off between the
performance of different tasks. That is, to optimize for the current task
distribution, it may need to compromise performance on some tasks to improve on
others. This means there exist multiple models that are each optimal at
different times, each addressing a distinct task-performance trade-off.
Researchers have discussed how to train particular models to address specific
preferences on these trade-offs. However, existing algorithms require
additional sample overheads -- a large burden when there are multiple, possibly
infinitely many, preferences. As a response, we propose Imprecise Bayesian
Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base
in the form of a convex hull of model parameter distributions and (2) obtains
particular models to address preferences with zero-shot. That is, IBCL does not
require any additional training overhead to construct preference-addressing
models from its knowledge base. We show that models obtained by IBCL have
guarantees in identifying the preferred parameters. Moreover, experiments show
that IBCL is able to locate the Pareto set of parameters given a preference,
maintain similar to better performance than baseline methods, and significantly
reduce training overhead via zero-shot preference addressing.
Related papers
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
Model merging is an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model.
Existing model-merging methods focus on enhancing average task accuracy.
We introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - IBCL: Zero-shot Model Generation for Task Trade-offs in Continual
Learning [15.77524891010002]
We propose Imprecise Bayesian Continual Learning (IBCL) to address task trade-off preferences.
IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base.
IBCL improves average per-task accuracy by at most 23% and peak per-task accuracy by at most 15% with respect to the baseline methods.
arXiv Detail & Related papers (2023-10-04T17:30:50Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.