Earning Extra Performance from Restrictive Feedbacks
- URL: http://arxiv.org/abs/2304.14831v2
- Date: Fri, 28 Jul 2023 07:51:03 GMT
- Title: Earning Extra Performance from Restrictive Feedbacks
- Authors: Jing Li, Yuangang Pan, Yueming Lyu, Yinghua Yao, Yulei Sui, and Ivor
W. Tsang
- Abstract summary: We set up a challenge named emphEarning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems.
The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks.
We propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution.
- Score: 41.05874087063763
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many machine learning applications encounter a situation where model
providers are required to further refine the previously trained model so as to
gratify the specific need of local users. This problem is reduced to the
standard model tuning paradigm if the target data is permissibly fed to the
model. However, it is rather difficult in a wide range of practical cases where
target data is not shared with model providers but commonly some evaluations
about the model are accessible. In this paper, we formally set up a challenge
named \emph{Earning eXtra PerformancE from restriCTive feEDdbacks} (EXPECTED)
to describe this form of model tuning problems. Concretely, EXPECTED admits a
model provider to access the operational performance of the candidate model
multiple times via feedback from a local user (or a group of users). The goal
of the model provider is to eventually deliver a satisfactory model to the
local user(s) by utilizing the feedbacks. Unlike existing model tuning methods
where the target data is always ready for calculating model gradients, the
model providers in EXPECTED only see some feedbacks which could be as simple as
scalars, such as inference accuracy or usage rate. To enable tuning in this
restrictive circumstance, we propose to characterize the geometry of the model
performance with regard to model parameters through exploring the parameters'
distribution. In particular, for the deep models whose parameters distribute
across multiple layers, a more query-efficient algorithm is further
tailor-designed that conducts layerwise tuning with more attention to those
layers which pay off better. Extensive experiments on different applications
demonstrate that our work forges a sound solution to the EXPECTED problem. Code
is available via https://github.com/kylejingli/EXPECTED.
Related papers
- Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation [8.77762056359264]
PaDiRec allows the customization and adaptation of recommendation model parameters to new task requirements without retraining.
We utilize the diffusion model as a parameter generator, employing adapter-free guidance in conditional training to learn the distribution of optimized model parameters.
As a model-agnostic approach, PaDiRec can leverage existing recommendation models as backbones to enhance their controllability.
arXiv Detail & Related papers (2024-10-14T15:50:35Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction [26.26211464623954]
Federated Importance-Aware Submodel Extraction (FIARSE) is a novel approach that dynamically adjusts submodels based on the importance of model parameters.
Compared to existing works, the proposed method offers a theoretical foundation for the submodel extraction.
Extensive experiments are conducted on various datasets to showcase the superior performance of the proposed FIARSE.
arXiv Detail & Related papers (2024-07-28T04:10:11Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - Multidimensional Item Response Theory in the Style of Collaborative
Filtering [0.8057006406834467]
This paper presents a machine learning approach to multidimensional item response theory (MIRT)
Inspired by collaborative filtering, we define a general class of models that includes many MIRT models.
We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model.
arXiv Detail & Related papers (2023-01-03T00:56:27Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Model Reuse with Reduced Kernel Mean Embedding Specification [70.044322798187]
We present a two-phase framework for finding helpful models for a current application.
In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model.
Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification.
arXiv Detail & Related papers (2020-01-20T15:15:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.