Related papers: Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian Continual Learning

Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian Continual Learning

URL: http://arxiv.org/abs/2305.14782v1
Date: Wed, 24 May 2023 06:39:00 GMT
Title: Zero-shot Task Preference Addressing Enabled by Imprecise Bayesian Continual Learning
Authors: Pengyuan Lu and Michele Caprio and Eric Eaton and Insup Lee
Abstract summary: We propose Imprecise Bayesian Continual Learning (IBCL) to address preferences on task-performance trade-offs. IBCL does not require any additional training overhead to construct preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the preferred parameters.
Score: 19.11678487931003
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some tasks to improve on others. This means there exist multiple models that are each optimal at different times, each addressing a distinct task-performance trade-off. Researchers have discussed how to train particular models to address specific preferences on these trade-offs. However, existing algorithms require additional sample overheads -- a large burden when there are multiple, possibly infinitely many, preferences. As a response, we propose Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base in the form of a convex hull of model parameter distributions and (2) obtains particular models to address preferences with zero-shot. That is, IBCL does not require any additional training overhead to construct preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the preferred parameters. Moreover, experiments show that IBCL is able to locate the Pareto set of parameters given a preference, maintain similar to better performance than baseline methods, and significantly reduce training overhead via zero-shot preference addressing.

Related papers

Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration [104.52015641099828]
Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models. We propose Compatibility-aware Knowledge Integration (CKI), which consists of Deep Assessment and Deep Splicing. The integrated model can be used directly for inference or for further fine-tuning.
arXiv Detail & Related papers (2025-01-10T01:42:43Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model [18.111868378615206]
We propose a pairwise few-shot ranker that achieves a close performance to that of a supervised model without requiring any complex training pipeline. Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.
arXiv Detail & Related papers (2024-09-26T11:19:09Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP) MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner. We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z)
IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning [15.77524891010002]
We propose Imprecise Bayesian Continual Learning (IBCL) to address task trade-off preferences. IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base. IBCL improves average per-task accuracy by at most 23% and peak per-task accuracy by at most 15% with respect to the baseline methods.
arXiv Detail & Related papers (2023-10-04T17:30:50Z)
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks. We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution. This approach poses a number of implementation and optimization challenges. We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.