Different Tunes Played with Equal Skill: Exploring a Unified
Optimization Subspace for Delta Tuning
- URL: http://arxiv.org/abs/2210.13311v1
- Date: Mon, 24 Oct 2022 14:57:35 GMT
- Title: Different Tunes Played with Equal Skill: Exploring a Unified
Optimization Subspace for Delta Tuning
- Authors: Jing Yi, Weize Chen, Yujia Qin, Yankai Lin, Ning Ding, Xu Han, Zhiyuan
Liu, Maosong Sun, Jie Zhou
- Abstract summary: Delta tuning (DET) is deemed as the new paradigm for using pre-trained language models (PLMs)
Up to now, various DETs with distinct design elements have been proposed, achieving performance on par with fine-tuning.
- Score: 95.72622659619445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Delta tuning (DET, also known as parameter-efficient tuning) is deemed as the
new paradigm for using pre-trained language models (PLMs). Up to now, various
DETs with distinct design elements have been proposed, achieving performance on
par with fine-tuning. However, the mechanisms behind the above success are
still under-explored, especially the connections among various DETs. To fathom
the mystery, we hypothesize that the adaptations of different DETs could all be
reparameterized as low-dimensional optimizations in a unified optimization
subspace, which could be found by jointly decomposing independent solutions of
different DETs. Then we explore the connections among different DETs by
conducting optimization within the subspace. In experiments, we find that, for
a certain DET, conducting optimization simply in the subspace could achieve
comparable performance to its original space, and the found solution in the
subspace could be transferred to another DET and achieve non-trivial
performance. We also visualize the performance landscape of the subspace and
find that there exists a substantial region where different DETs all perform
well. Finally, we extend our analysis and show the strong connections between
fine-tuning and DETs.
Related papers
- Model Fusion through Bayesian Optimization in Language Model Fine-Tuning [16.86812534268461]
Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains.
We introduce a novel model fusion technique that optimize both the desired metric and loss through multi-objective Bayesian optimization.
Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.
arXiv Detail & Related papers (2024-11-11T04:36:58Z) - Submodular Framework for Structured-Sparse Optimal Transport [7.030105924295838]
Unbalanced optimal transport (UOT) has recently gained much attention due to its flexible framework for handling unnormalized measures and its robustness.
In this work, we explore learning (structured) sparse transport plans in the UOT setting, i.e., transport plans have an upper bound on the number of non-sparse entries in each column.
We propose novel sparsity-constrained UOT formulations building on the recently explored mean discrepancy based UOT.
arXiv Detail & Related papers (2024-06-07T13:11:04Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision
Transformers [50.23439411530435]
We show that Partial Fine-Tuning can be an innovative and promising direction capable of concurrently enhancing both efficiency and accuracy.
We propose a novel fine-tuned angle metric to guide the selection of appropriate layers for partial fine-tuning.
Comprehensive experiments on a wide range of datasets and models validate the great potential of partial fine-tuning.
arXiv Detail & Related papers (2023-12-25T10:11:34Z) - Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios.
We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer.
Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z) - Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for
Pre-trained Language Models [90.24999406296867]
In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched.
Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full- parameter fine-tuning.
arXiv Detail & Related papers (2022-03-14T07:56:32Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - High-Dimensional Bayesian Optimization via Nested Riemannian Manifolds [0.0]
We propose to exploit the geometry of non-Euclidean search spaces, which often arise in a variety of domains, to learn structure-preserving mappings.
Our approach features geometry-aware Gaussian processes that jointly learn a nested-manifold embedding and a representation of the objective function in the latent space.
arXiv Detail & Related papers (2020-10-21T11:24:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.