Related papers: KronA: Parameter Efficient Tuning with Kronecker Adapter

KronA: Parameter Efficient Tuning with Kronecker Adapter

URL: http://arxiv.org/abs/2212.10650v1
Date: Tue, 20 Dec 2022 20:56:52 GMT
Title: KronA: Parameter Efficient Tuning with Kronecker Adapter
Authors: Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh
Abstract summary: We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
Score: 17.175408603709712
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning PLMs. One popular category of PET methods is the low-rank adaptation methods which insert learnable truncated SVD modules into the original model either sequentially or in parallel. However, low-rank decomposition suffers from limited representation power. In this work, we address this problem using the Kronecker product instead of the low-rank representation. We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

Related papers

Singular Value Decomposition on Kronecker Adaptation for Large Language Model [0.8747606955991707]
Large pre-trained Transformer models achieve state-of-the-art results across diverse language and reasoning tasks.<n>Full fine-tuning incurs substantial storage, memory, and computational overhead.<n>We propose SoKA, a novel PEFT strategy that combines Kronecker-product tensor factorization with SVD-driven initialization and dynamic rank selection.
arXiv Detail & Related papers (2025-06-18T08:28:53Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models [14.762222323897978]
We propose a novel parameter-efficient training (PET) method for large language models. Unlike prior methods, this subset is not fixed in location but rather which parameters are modified over the course of training. Our method enables a seamless scaling of the subset size across an arbitrary proportion of the total model size.
arXiv Detail & Related papers (2024-11-13T13:53:10Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. We propose a novel approach that employs a low rank tensor parametrization for model updates. Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models [1.2263658159556594]
Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. We propose Stratified Progressive Adaptation Fine-tuning (SPAFIT) based on the localization of different types of linguistic knowledge. Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods.
arXiv Detail & Related papers (2024-04-30T21:07:32Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models. We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z)
Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) In small PLMs, there are usually noticeable performance differences among PET methods. We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z)
PVP: Pre-trained Visual Parameter-Efficient Tuning [29.05396521860764]
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. We propose a Pre-trained Visual. efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
arXiv Detail & Related papers (2023-04-26T15:55:29Z)
CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor. We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.