Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian
Continual Learning
- URL: http://arxiv.org/abs/2305.14782v1
- Date: Wed, 24 May 2023 06:39:00 GMT
- Title: Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian
Continual Learning
- Authors: Pengyuan Lu and Michele Caprio and Eric Eaton and Insup Lee
- Abstract summary: We propose Imprecise Bayesian Continual Learning (IBCL) to address preferences on task-performance trade-offs.
IBCL does not require any additional training overhead to construct preference-addressing models from its knowledge base.
We show that models obtained by IBCL have guarantees in identifying the preferred parameters.
- Score: 19.11678487931003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Like generic multi-task learning, continual learning has the nature of
multi-objective optimization, and therefore faces a trade-off between the
performance of different tasks. That is, to optimize for the current task
distribution, it may need to compromise performance on some tasks to improve on
others. This means there exist multiple models that are each optimal at
different times, each addressing a distinct task-performance trade-off.
Researchers have discussed how to train particular models to address specific
preferences on these trade-offs. However, existing algorithms require
additional sample overheads -- a large burden when there are multiple, possibly
infinitely many, preferences. As a response, we propose Imprecise Bayesian
Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base
in the form of a convex hull of model parameter distributions and (2) obtains
particular models to address preferences with zero-shot. That is, IBCL does not
require any additional training overhead to construct preference-addressing
models from its knowledge base. We show that models obtained by IBCL have
guarantees in identifying the preferred parameters. Moreover, experiments show
that IBCL is able to locate the Pareto set of parameters given a preference,
maintain similar to better performance than baseline methods, and significantly
reduce training overhead via zero-shot preference addressing.
Related papers
- Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model [18.111868378615206]
We propose a pairwise few-shot ranker that achieves a close performance to that of a supervised model without requiring any complex training pipeline.
Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.
arXiv Detail & Related papers (2024-09-26T11:19:09Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - IBCL: Zero-shot Model Generation for Task Trade-offs in Continual
Learning [15.77524891010002]
We propose Imprecise Bayesian Continual Learning (IBCL) to address task trade-off preferences.
IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base.
IBCL improves average per-task accuracy by at most 23% and peak per-task accuracy by at most 15% with respect to the baseline methods.
arXiv Detail & Related papers (2023-10-04T17:30:50Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.