Related papers: Parameter-Efficient Fine-Tuning Design Spaces

Parameter-Efficient Fine-Tuning Design Spaces

URL: http://arxiv.org/abs/2301.01821v1
Date: Wed, 4 Jan 2023 21:00:18 GMT
Title: Parameter-Efficient Fine-Tuning Design Spaces
Authors: Jiaao Chen, Aston Zhang, Xingjian Shi, Mu Li, Alex Smola, Diyi Yang
Abstract summary: We present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings. We show experimentally that these methods consistently and significantly outperform investigated parameter-efficient fine-tuning strategies across different backbone models and different tasks in natural language processing.
Score: 63.954953653386106
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Parameter-efficient fine-tuning aims to achieve performance comparable to fine-tuning, using fewer trainable parameters. Several strategies (e.g., Adapters, prefix tuning, BitFit, and LoRA) have been proposed. However, their designs are hand-crafted separately, and it remains unclear whether certain design patterns exist for parameter-efficient fine-tuning. Thus, we present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings. Instead of focusing on designing another individual tuning strategy, we introduce parameter-efficient fine-tuning design spaces that parameterize tuning structures and tuning strategies. Specifically, any design space is characterized by four components: layer grouping, trainable parameter allocation, tunable groups, and strategy assignment. Starting from an initial design space, we progressively refine the space based on the model quality of each design choice and make greedy selection at each stage over these four components. We discover the following design patterns: (i) group layers in a spindle pattern; (ii) allocate the number of trainable parameters to layers uniformly; (iii) tune all the groups; (iv) assign proper tuning strategies to different groups. These design patterns result in new parameter-efficient fine-tuning methods. We show experimentally that these methods consistently and significantly outperform investigated parameter-efficient fine-tuning strategies across different backbone models and different tasks in natural language processing.

Related papers

Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem. We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z)
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer. Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z)
Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance [1.370633147306388]
Tree-structured Parzen estimator (TPE) is widely used in recent parameter tuning frameworks. Despite its popularity, the roles of each control parameter and the algorithm intuition have not been discussed so far.
arXiv Detail & Related papers (2023-04-21T17:02:38Z)
DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers. We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning. In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z)
Meta Reinforcement Learning for Optimal Design of Legged Robots [9.054187238463212]
We present a design optimization framework using model-free meta reinforcement learning. We show that our approach allows higher performance while not being constrained by predefined motions or gait patterns.
arXiv Detail & Related papers (2022-10-06T08:37:52Z)
Parameter Tuning Strategies for Metaheuristic Methods Applied to Discrete Optimization of Structural Design [0.0]
This paper presents several strategies to tune the parameters of metaheuristic methods for (discrete) design optimization of reinforced concrete (RC) structures. A novel utility metric is proposed, based on the area under the average performance curve.
arXiv Detail & Related papers (2021-10-12T17:34:39Z)
Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)
MO-PaDGAN: Reparameterizing Engineering Designs for Augmented Multi-objective Optimization [13.866787416457454]
Multi-objective optimization is key to solving many Engineering Design problems. Deep generative models can learn compact design representations. Mo-PaDGAN adds a Determinantal Point Processes based loss function to the generative adversarial network.
arXiv Detail & Related papers (2020-09-15T13:58:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.