Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH
Mask based Efficient Fine-tuning
- URL: http://arxiv.org/abs/2403.08484v1
- Date: Wed, 13 Mar 2024 12:50:23 GMT
- Title: Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH
Mask based Efficient Fine-tuning
- Authors: Ming Dong, Kang Xue, Bolong Zheng, Tingting He
- Abstract summary: We propose an IRD algorithm to search the best setting of sample- parameter pair for FISH Mask.
We demonstrate the effectiveness and rationality of proposed strategy by conducting experiments on GLUE benchmark.
- Score: 9.423534576254712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In view of the huge number of parameters of Large language models (LLMs) ,
tuning all parameters is very costly, and accordingly fine-tuning specific
parameters is more sensible. Most of parameter efficient fine-tuning (PEFT)
concentrate on parameter selection strategies, such as additive method,
selective method and reparametrization-based method. However, there are few
methods that consider the impact of data samples on parameter selecting, such
as Fish Mask based method. Fish Mask randomly choose a part of data samples and
treat them equally during parameter selection, which is unable to dynamically
select optimal parameters for inconstant data distributions. In this work, we
adopt a data-oriented perspective, then proposing an IRD ($\mathrm{\underline
I}$terative sample-parameter $\mathrm{\underline R}$ange $\mathrm{\underline
D}$ecreasing) algorithm to search the best setting of sample-parameter pair for
FISH Mask. In each iteration, by searching the set of samples and parameters
with larger Fish information, IRD can find better sample-parameter pair in most
scale. We demonstrate the effectiveness and rationality of proposed strategy by
conducting experiments on GLUE benchmark. Experimental results show our
strategy optimizes the parameter selection and achieves preferable performance.
Related papers
- Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Efficient and Robust Bayesian Selection of Hyperparameters in Dimension
Reduction for Visualization [0.0]
We introduce an efficient and robust auto-tuning framework for hyper parameter selection in dimension reduction (DR) algorithms.
Our approach enables efficient hyper parameter selection with multi-objective trade-offs and allows us to perform data-driven analysis.
We evaluate our results on various synthetic and real-world datasets using multiple quality metrics.
arXiv Detail & Related papers (2023-06-01T05:36:22Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - HPS-Det: Dynamic Sample Assignment with Hyper-Parameter Search for
Object Detection [25.71039912705784]
We propose a novel dynamic sample assignment scheme based on hyper- parameter search.
Experiments demonstrate that the resulting HPS-Det brings improved performance over different object detection baselines.
arXiv Detail & Related papers (2022-07-23T15:13:57Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Hyperparameter Selection for Subsampling Bootstraps [0.0]
A subsampling method like BLB serves as a powerful tool for assessing the quality of estimators for massive data.
The performance of the subsampling methods are highly influenced by the selection of tuning parameters.
We develop a hyperparameter selection methodology, which can be used to select tuning parameters for subsampling methods.
Both simulation studies and real data analysis demonstrate the superior advantage of our method.
arXiv Detail & Related papers (2020-06-02T17:10:45Z) - Multi-Objective Hyperparameter Tuning and Feature Selection using Filter
Ensembles [0.8029049649310213]
We treat feature selection as a multi-objective optimization task.
First uses multi-objective model-based optimization.
Second is an evolutionary NSGA-II-based wrapper approach to feature selection.
arXiv Detail & Related papers (2019-12-30T13:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.